Introduction

Advancements in culture-independent methods have granted the possibility to study microbial diversity with unprecedented detail, revolutionizing our understanding of life on Earth (Brazelton et al. 2010) and transforming the comprehension we have of our own relation with the microbial world (The Human Microbiome Project Consortium 2012). The most thoroughly studied human-associated microbial community, the gut microbiome, has been shown to be critical for human health (de Vos and De Vos 2012; Pflughoeft and Versalovic 2012), and the number of studies looking into the association between the gut microbiome and a myriad of factors has greatly increased in recent years. However scrutinized, several technical issues remain underexplored in the processing of gut samples and pose a serious threat to the accuracy of the new findings (de la Cuesta-Zuluaga and Escobar 2016). Among them, the acquisition of genetic material has been shown to be a critical step, on which multiple biases can be introduced and that can result in an erroneous representation of the microbial community under scrutiny.

Differences in DNA extraction protocols are known to affect the studied microbial community, causing the overrepresentation or underrepresentation of given microbial groups (Ariefdjohan et al. 2010; Guo and Zhang 2013; Smith et al. 2011). For instance, some DNA extraction protocols require a bead-beating step that contributes to the disruption of the cell wall, whereas others rely on chemical lysis for this procedure (Ariefdjohan et al. 2010). The mechanical disruption of microbial cells has been shown to yield higher amounts of DNA of better quality (Feinstein et al. 2009; Smith et al. 2011; Wesolowska-Andersen et al. 2014) and was the preferred method of benchmark collaborative projects, such as MetaHIT and the human microbiome project (HMP) (Wesolowska-Andersen et al. 2014). However, performance differs between mechanical disruption-based protocols (Ariefdjohan et al. 2010; Feinstein et al. 2009; Guo and Zhang 2013; Smith et al. 2011; Wesolowska-Andersen et al. 2014) and even among lots of the same DNA extraction kit (Salter et al. 2014), making data comparison among studies a heartache challenge.

An additional issue that has been highly publicized but less explored in the analysis of fecal samples is the effect of reagent contamination and sample biomass. Both 16S ribosomal RNA (rRNA) gene and metagenomic analyses are non-specific and sensitive to contamination with foreign microbial DNA (Tanner et al. 1998). Sources of contamination include laboratory environment (Salter et al. 2014), DNA extraction reagents (Corless et al. 2000; Tanner et al. 1998), PCR reagents (Rand and Houck 1990; Shen et al. 2006), and even ultrapure water (Kulakov et al. 2002; Mcalister et al. 2002). Foreign DNA contamination has been downplayed when dealing with feces compared with other biological samples (e.g., blood, lung, meconium) because of their high microbial biomass. Samples with low biomass are more prone to being affected by contaminant genetic material, as they provide little template DNA to compete with that contained in laboratory reagents (Grahn et al. 2003; Salter et al. 2014). Furthermore, there is a critical point in which sample biomass becomes so low that the contaminating DNA dominates sequence libraries (Salter et al. 2014).

Despite the importance recognized to the DNA extraction step, the impact of reagent contamination and sample biomass remains largely unaddressed in microbial ecology; to the best of our knowledge, no study has explored this on the inference of the human gut microbiota. Here, we assess the sequencing yield, composition, and diversity of the gut microbiota studied from human fecal samples, which were serially diluted and processed using two commercially available DNA extraction kits to determine the impact of DNA extraction, initial sample biomass, and reagent contamination on the evaluation of this microbial community using the 16S rRNA gene as marker.

Materials and methods

DNA extraction, serial dilutions, and 16S rRNA gene sequencing

Microbial DNA was obtained from fecal samples from five human donors (Online Resource 1). Samples were kept at − 80 °C since collection until DNA extraction. We assessed two commonly used commercial kits for fecal DNA extraction: PowerSoil DNA Isolation Kit (MoBio, Carlsbad, CA) and QIAamp DNA Stool Mini Kit (Qiagen, Hilden, Germany). We mostly followed the manufacturers’ instructions but introduced changes to improve DNA yield with mechanical lysis. For the MoBio kit, the bead-beating step (step 5) was reduced from 10 to 5 min (with 10-s rest intervals every minute) at 71 Hz using the original beads contained in the kit; for the Qiagen kit, a bead-beating step at 41 Hz for 8 s was introduced between steps 4 and 5 of the Isolation of DNA from Stool for Pathogen Detection protocol, using 1-mm sterile zirconium beads. In both cases, bead beating was performed in a Bead Blaster 24 (Benchmark Scientific, NJ). In both protocols, we started with 200 mg of homogenized feces (no dilution; D0). Then, we performed five 1:10 serial dilutions with autoclaved PBS, resulting in an initial biomass of 20 mg (dilution D1), 2 mg (dilution D2), 0.2 mg (dilution D3), 0.02 mg (dilution D4), and 0.002 mg (dilution D5). In all cases, DNA extraction was performed from a final volume of 200 μl, and DNA concentrations were obtained with a Synergy HT Microplate Reader (Bio-Tek Instruments, Winooski, VT).

Additionally, we obtained DNA sequences from nine negative controls to track potential sources of contamination. “Blank” DNA extractions were obtained with the two kits using autoclaved PBS as starting material (no feces)—blMBPBS and blQIAPBS; in the case of MoBio, it was possible to make an additional blank extraction from the solution originally contained in the PowerSoil Bead Tubes, adding no PBS (blMB). Blank extractions were expected to gather all sources of contamination introduced during DNA extraction protocols. We additionally obtained sequences from the MoBio C6 (buffC6) and Qiagen AE (buffAE) elution buffers, and autoclaved PBS (PBS). Sequences from buffC6 and buffAE served to distinguish the contamination introduced by these specific reagents from that introduced at other points during DNA extraction; since PBS was used to homogenize and dilute samples, as well as starting material in blMBPBS and blQIAPBS, we considered important to determine its microbial signature to identify instances of contamination by this reagent. We also obtained sequences from three additional negative controls (ultrapure water—H2O, and two “empty” wells—empA1 and empB1—with no sample but following the same sequencing treatment as samples); H2O informed about contamination introduced during PCR, library generation, and indexing, whereas empA1 and empB1 corresponded to pure sequencing noise. Finally, sequences from one positive control (mock community HM-782D, BEI Resources, Manassas, VA) served to get an estimate of the sequencing error rates.

All samples and controls were sent to the University of Michigan Medical School Host Microbiome Initiative (Ann Arbor, MI) for PCR amplification and sequencing of the 16S rRNA gene on the MiSeq platform. Several steps were followed in preparing samples for sequencing on this platform. Broadly, these included library generation and indexing, quality control, normalization and pooling, quantification, and sequencing. Specifically, the V4 hypervariable region of the 16S rRNA gene was amplified using the F515 (5′-CACGGTCGKCGGCGCCATT-3′) and R806 (5′-GGACTACHVGGGTWTCTAAT-3′) primers. We preferred to sequence this region (~ 250 bp) because its length allowed for fully overlapping forward and reverse reads, diminishing sequencing error rates. Multiplex PCR was performed with dual indices (Kozich et al. 2013) using 17 μl of Accuprime™ Pfx SuperMix (Thermo Fisher Scientific, Carlsbad, CA), 1 or 5 μl of template DNA (see Online Resource 2), and 2 μl of each paired set of index primers. PCR cycling condition consisted of an initial denaturation of 2 min at 95 °C; 30 cycles of 20 s at 95 °C, 15 s at 55 °C, and 5 min at 72 °C; and a final step at 72 °C for 10 min. All samples underwent electrophoresis on 96-well SYBR Safe E-gels (Invitrogen, Carlsbad, CA) to ensure amplification proceeded normally. Post-PCR products were quantified using the KAPA qPCR Library Quantification Kit (Kapa Biosystems, Wilmington, MA), cleaned up, and normalized with the SequalPrep Normalization Plate Kit (Invitrogen, Carlsbad, CA). Amplicons were finally pooled (5 μl per sample; average working concentration = 7.2 nM, range = 7.04–7.45 nM) into single wells. To assess the quality of the library, an Agilent Bioanalyzer Trace was performed. Finally, the pooled library was sequenced using the Illumina MiSeq sequencing platform with V2 chemistry. Detailed laboratory protocols are available at https://github.com/SchlossLab/MiSeq_WetLab_SOP/blob/master/MiSeq_WetLab_SOP_v4.md.

Bioinformatic processing of sequences

Raw 16S rRNA gene sequences were processed using Mothur v.1.38 (Schloss et al. 2009) following its Illumina MiSeq standard operating procedure (Kozich et al. 2013). Briefly, we first extracted sequences and quality scores from the paired fastq files and assembled the reads to form contigs. We eliminated reads containing bases with a quality score below 20, ambiguous bases, or shorter than 275 bp long. Next, we aligned the reads using the Silva reference alignment v.123 (Quast et al. 2013) and removed reads with a homopolymer run > 8 nucleotides and reads that did not overlap the region of the alignment spanning the V4 region. Then, we carried a preclustering step in which reads with an identity ≥ 99% (i.e., sequences differing in two nucleotides or less) were merged. Chimeric sequences were detected and discarded by UCHIME (Edgar et al. 2011). After that, we assigned the taxonomic classification to reads using the Greengenes 13_8_99 (DeSantis et al. 2006) and removed reads classified as mitochondria, eukaryota, or unknown. Next, we calculated the sequencing error rates using reads from the mock community (average error rate = 0.0083). Using the average neighbor algorithm, we generated operational taxonomic units (OTUs) at 97% sequence identity which were then taxonomically classified by consensus using Greengenes 13_8_99. A relaxed neighbor -joining tree with one representative sequence per OTU was finally obtained with Clearcut (Evans et al. 2006) after calculating uncorrected pairwise distances between aligned reads.

Statistical analyses

To quantify the overall impact of the origin of sample (donor), DNA extraction kit and dilution on the inferred gut microbiota diversity, membership (presence/absence), and abundance, we performed permutational multivariate analysis of variance (PERMANOVA) using the adonis function (analysis of variance using distance matrices) implemented in the Vegan package of R (Oksanen et al. 2015). PERMANOVAs were performed on three β-diversity distances—weighted and unweighted UniFrac obtained with the GUniFrac package of R (Chen et al. 2012) and Bray-Curtis with Vegan (Oksanen et al. 2015)—and three alpha-diversity metrics obtained with Mothur—number of observed OTUs, Shannon diversity index, and the inverse Simpson index. UniFrac analyses were performed on unrarefied read counts. Rarefication is a necessary step in ecological analyses to control for sampling unevenness; however, rarefication was undesirable in this case because negative controls and some highly diluted samples had too few 16S rRNA gene reads (< 100 reads/sample). Rarefying at very low depths made results extremely inconsistent. Alternatively, we performed Bray-Curtis analyses on relative abundances to partially control for unevenness in read counts. In addition, we performed analyses on samples with > 10,000 reads/sample to test the sensitivity of PERMANOVA to sampling unevenness. Principal correspondence analysis (PCoA) and taxonomic plots were obtained. Alpha-diversity analyses were performed on the subset of undiluted samples (D0) after rarefying the dataset to 13,000 reads/sample.

Availability of data and material

Raw 16S rRNA gene sequences were deposited at the NCBI’s Short Read Archive (SRA) under BioProject PRJNA385915. The Mothur logfile and R code built to obtain the results reported here are available in https://github.com/jsescobar/kitome.

Results

16S rRNA gene reads

We first examined the number of 16S rRNA gene reads across DNA extraction kits and dilutions (Fig. 1, Online Resource 2). Note that the number of reads is not expected to be proportional to the initial sample biomass because libraries were normalized before sequencing. Instead, a high number of reads indicates that samples progressed appropriately in the laboratory workflow (DNA extraction, indexing, PCR, purification, normalization, and sequencing). In our experiments, the number of reads is of importance since it informs about the ability of each kit to extract sufficient microbial DNA in instances of high and low initial sample biomass.

Fig. 1
figure 1

Total number of 16S rRNA gene reads for samples from five individuals obtained with MoBio (a) or Qiagen (b) DNA extraction kits at different 1:10 dilutions. Undiluted samples (200 mg of starting feces) (D0), 20 mg (D1), 2 mg (D2), 200 μg (D3), 20 μg (D4), and 2 μg (D5)

As shown in Fig. 1a, the number of reads was high (> 10,000 reads/sample) in most samples extracted with MoBio and even in instances of high dilution, although this might depend upon the considered sample. In contrast, the Qiagen kit failed treating samples with very low biomass but was as efficient as the MoBio kit with samples with as low as 2 mg of human feces (Fig. 1b). This indicates that fecal microbial inference is jeopardized when using a starting biomass < 2 mg. Still, DNA from samples with as low as 2 mg of human feces (D2) was enough to get a high number of reads independent of the DNA extraction kit (mean ± SD: MoBio = 34,885 ± 12,201; Qiagen = 39,433 ± 20,393). All negative controls resulted in a very low number of reads independent of the DNA extraction kit (range 25–42 reads/sample) (Online Resource 2). This number is several orders of magnitude lower than the number of reads obtained in most samples, regardless of dilution. Any signal that may be introduced by laboratory reagents would be drowned by the sample’s microbial signal and is, thus, negligible.

DNA extraction affects the gut microbiota

As said above, the number of reads only informs about the ability of the kits to extract DNA in sufficient quantity and quality for the PCR and sequencing reactions to appropriately perform. Hence, it is not a good predictor of the precision of the microbial community inference. To appropriately evaluate the impact of the tested parameters on the gut microbiota membership and abundance, we performed PERMANOVA with β-diversity measures, either dependent or independent of the reconstruction of phylogenetic trees—UniFrac and Bray-Curtis, respectively. These analyses showed a pervasive and significant effect of the origin of the sample (i.e., donor) in the inferred microbiota; this factor explained the largest portion of the total variance in β-diversity (Table 1). Noteworthy, analyses were not affected by unevenness in the number of reads among samples, as separate PERMANOVA on a restricted dataset considering only samples with > 10,000 reads/sample output similar results (Online Resource 3). Differences in sex or BMI among donors might be at the origin of variation in the gut microbiota in these datasets (Online Resource 1). Large inter-subject variation is a common observation in studies of the human gut microbiota (Costello et al. 2009; The Human Microbiome Project Consortium 2012).

Table 1 PERMANOVA on β-diversity measures

The two DNA extraction kits allowed inferring microbial consortia with similar membership but different abundances. This is illustrated by the smaller unweighted UniFrac distances observed between pairs of undiluted samples (D0) treated with either kit (Fig. 2a) but the larger distances observed with weighted UniFrac (Fig. 2b) or Bray-Curtis (Fig. 2c). A comparison of the taxonomic assignments of OTUs among undiluted samples (D0) showed that, at the phylum level, there were more OTUs from Firmicutes and Actinobacteria and less from Bacteroidetes with MoBio than Qiagen (Fig. 3a–b; Online Resource 4). At the class level, there were more OTUs classified as Clostridia (Firmicutes), Coriobacteriia (Actinobacteria), and class Actinobacteria (phylum Actinobacteria) with MoBio than Qiagen which, in turn, resulted in an increased number of OTUs belonging to Bacteroidia (Bacteroidetes), unclassified Firmicutes (Firmicutes), and Mollicutes (Tenericutes) (Fig. 3c, d; Online Resource 4). At the genus level, the pattern was variable among samples; among the most abundant genera, more reads from Blautia, Coprococcus, unclassified Lachnospiraceae, and Collinsella were obtained with MoBio, whereas Oscillospira and unclassified Ruminococcaceae were overrepresented with Qiagen (Online Resource 4).

Fig. 2
figure 2

Principal correspondence analyses (PCoA) on unweighted UniFrac (a), weighted UniFrac (b), and Bray-Curtis (c) distances. Colors and shapes represent the origin of sample (MoBio (circles); Qiagen (triangles); negative controls (crosses)). In panels b and c, the point size is proportional to the number of sequenced reads. Undiluted samples (D0) (filled points); diluted samples (D1–D5) (open points)

Fig. 3
figure 3

Taxonomic profile of OTUs showing the relative abundance of the most abundant phyla (a, b) and classes (c, d) across samples, dilutions, and negative controls. a, c MoBio. b, d Qiagen

In terms of alpha-diversity, the analysis of the subset of undiluted samples (D0) indicated that most of the microbes of these samples were captured in our sequencing (Online Resource 5). In addition, the two DNA extraction kits allowed inferring communities with similar diversity, suggesting that their coverage of the gut microbiota was similar (number of observed OTUs: MoBio = 243, Qiagen = 285, p = 0.10; Shannon diversity index: MoBio = 3.52, Qiagen = 3.70, p = 0.10; inverse Simpson index: MoBio = 19.05, Qiagen = 19.70, p = 0.69).

Dilution of human feces has a moderate impact on the gut microbiota inference

The composition of the microbial community inferred from fecal samples, independently of dilution, was significantly different to that of negative controls, with the exception of dilution D5 treated with Qiagen (Online Resource 6). Regarding fecal samples only, dilution, in general, had a statistically significant effect on microbial inference, explaining 6–14% of the observed variance, depending on the β-diversity measure evaluated (Table 1; Online Resource 3). Unexpectedly, however, this effect was moderate and did not introduce severe bias into the microbial inference. The effect was minor in terms of presence/absence of OTUs, as demonstrated by the unweighted UniFrac analysis, especially in samples treated with MoBio (Fig. 2a). Analyses that take OTU abundance into account (weighted UniFrac and Bray-Curtis) confirmed that fecal samples treated with MoBio had more similar composition than samples treated with Qiagen, especially in instances of high dilution (Fig. 2b, c). In most cases, samples with as low as 2 mg (D2) or 0.2 mg (D3) of feces resulted in a reduced introduction of bias into the representation of the gut microbiota. Taxonomic analyses confirmed these results, indicating overrepresentation of Firmicutes (Clostridia class) and underrepresentation of Actinobacteria (Coriobacteriia class) with dilution in MoBio and a more irregular pattern in Qiagen (Fig. 3). At the genus level, distortion of the community was idiosyncratic of each sample (Online Resource 7).

Discussion

Previous studies have explored the impact of DNA extraction on the inferred microbial communities (Ariefdjohan et al. 2010; Guo and Zhang 2013; Smith et al. 2011). However, the impacts of reagent contamination and sample biomass have been poorly addressed; yet, it has been demonstrated that when microbial biomass is low, the signal coming from samples is overwhelmed by the signal emerging from laboratory reagents (note that reagents are not DNA-free, and autoclave does not make DNA disappear). The extent of this contamination can be very large; the contaminating species are characteristic to each laboratory, reagent, and reagent’s lot (Salter et al. 2014), reason why it is highly recommended to include several negative controls in experiments. In the case of human feces serving to infer processes occurring in the gut microbiome, there are no reports determining the threshold at which biomass is so low that the inferred microbial community may be artifactual. Low fecal biomass might be a considerable issue in forensics, when dealing with ancient specimens or in some animal models. In this report, we demonstrate that contamination introduced by laboratory reagents is probably limited within the wide range of tested biomasses and that the variance introduced by the protocol employed to extract DNA is a bigger source of bias in human gut microbiome studies.

We show that the two tested kits were efficient in extracting DNA from samples with 100 times lower biomass than suggested by the manufacturers’ protocols. However, the gut microbiota inferred with the two kits was different, especially when considering the abundance of microbial groups. In general, the dissimilar nature of the two DNA extraction protocols could explain the differences in microbial community profiles (Guo and Zhang 2013). It has been repeatedly reported that protocols incorporating bead-containing lysing matrix and vigorous shaking extract larger amounts of DNA and produce a more comprehensive microbial profile than protocols based on chemical lysis alone (Ariefdjohan et al. 2010; Guo and Zhang 2013; Smith et al. 2011). In our experiments, the protocols for DNA extraction incorporated both chemical and mechanical cell lysis, although each followed specific procedures and used proprietary reagents. Pointing out exactly which procedures/components of the kits might be responsible of the differences we found seems too speculative without specific tests. Furthermore, the fact that differences have even been shown to exist among kit lots of the same brand (Salter et al. 2014) suggests that sources of variance are not easy to control for and raise the question whether data from different studies should be compared at all (Schloss 2008; Wesolowska-Andersen et al. 2014). The fundamental puzzle with kit and lot effects is that, in real samples, there is no means to know the actual microbial community composition, so that there is no way to evaluate the kit/lot accuracy. To be conservative, in a given study, total microbial DNA should be extracted with the same kit, ideally the same kit lot, and caution must be taken when comparing data from different studies.

Importantly, we demonstrate that highly diluted samples exhibit a microbial community profile distinguishable, in most cases, from that of laboratory reagents. Highly diluted samples in our study had as few as 20 μg (D4) or 2 μg (D5) of feces. Although samples with such low biomass are unsuitable for robust microbial inference, and unrealistic in most study designs, their microbial signature is still distinguishable from that of laboratory reagents, especially looking at the presence/absence of data. This limited effect of dilution in the inferred gut microbial community is unexpected since previous reports demonstrated that samples with low microbial biomass basically amplify contaminating DNA (Salter et al. 2014). In our case, samples with as low as 2 mg of feces allowed obtaining both high numbers of 16S rRNA gene reads and a representation of the gut microbiota similar to that obtained with samples with 100 times more biomass.

Our results are particularly relevant now that multiple calls for the standardization of procedures have been made in the microbial ecology community (Alivisatos et al. 2015; Blaser et al. 2016), as they provide insights on the steps researchers must focus on when dealing with DNA extraction from fecal samples. As general recommendations, in both 16S rRNA gene and metagenomic experiments, the inclusion of as many technical controls as possible is highly encouraged to control for potential sources of noise in the final dataset; this includes laboratory and reagent contaminants and sequencing errors. Furthermore, researchers should avoid mixing different DNA extraction kits within a study and must include the kit’s lot as an additional metadatum for each sample. The question of how to address comparisons between studies and meta-analyses remains open; however, the fact that measurements between samples extracted with different kits can be statistically significant strengthens the idea that comparisons between studies that employed different extraction methods should be discouraged (Henderson et al. 2013).