Introduction

The Tropical Andes are located in the Neotropical ecoregion and are considered a hotspot for biodiversity due to the large number of endemic plant and vertebrate species [1]. The central Andean mountain range in Colombia is part of a volcanic belt characterized by geothermal activity that is made evident by the presence fumaroles and hot springs, many of which are found within the Nevados National Natural Park (Nevados NNP) [2]. These hot springs are heated mainly by the underlying magma chamber from volcanic activity and have a high sulfate content and low pH. El Coquito spring is located within protected areas of the Nevados NNP; it has little or no anthropogenic influence and constitutes a unique and extreme ecosystem due to its high elevation (3,973 m above sea level), high exposure to UV light, water temperature, and mineral composition. Furthermore, no research has been done regarding the microbial communities that populate this acidic hot spring.

The use of culture-independent methods to study microbial diversity has expanded our view of the microbial world and allowed access to extreme and difficult to study environments, such as these acidic water ecosystems [3]. High-throughput approaches, which include gene chips and novel sequencing technologies, can provide rapid detection and higher resolution of the microbial communities in a complex sample. A high-density 16S ribosomal RNA (rRNA) gene microarray for parallel, multispecies detection has been used to analyze and compare diverse communities [4], revealing greater diversity when compared with 16S rRNA gene clone libraries [58]. Surveys with pyrosequencing of 16S rRNA gene variable regions (pyrotags) provide greater depth of coverage, with thousands of sequence reads per sample, and have revealed the presence of rare community members that might otherwise go undetected when using more labor-intensive clone libraries [9, 10]. A recent study of the microbial communities from the human intestine using 454 pyrosequencing and a phylogenetic microarray showed similar profiles and a strong correlation at the phylum, class, and order levels [11]. While these high-throughput approaches provide a great amount of information without a priori knowledge of the community structure, they are also subject to biases that arise from methodological and technical limitations inherent to sample preparation, PCR amplifications, and sequencing [1215]. Thus, descriptions of bacterial communities using these different approaches may result in different profiles that, even if not completely consistent with one another, can provide a more thorough overview of the community structure.

In this study, the microbial community present in El Coquito hot spring was characterized using three different approaches based on amplification and analysis of 16S rRNA genes: a high-density 16S rRNA gene microarray, 454 pyrosequencing of hypervariable regions (V5–V6), and clone libraries of near full-length genes. Our multi-approach analysis revealed great diversity and gave a more thorough assessment of the structure of this acidophilic microbial community.

Materials and Methods

Sample Collection and Analysis

Superficial running stream water (16 L) was collected in 5-L sterile plastic containers in April 10 2008 at El Coquito hot spring (04°52′27″ N; 75°15′51.4″ W) by filling the containers to the brim and capping. Samples were transported at 4°C to the laboratory and processed within 18 h for further analysis (physicochemical analysis, total microbial counts, and DNA isolation). Temperature and pH were recorded in situ using a Hach pH-meter equipped with a pH and temperature probe. Physicochemical analyses were performed according to Standard Methods [16]. Water (500 mL) was fixed with 4% (v/v) paraformaldehyde and filtered using a 0.22-μm polycarbonate filter (Millipore, Billerica, MA, USA). Filter sections were impregnated with 4′,6-diamidino-2-phenylindole (DAPI; 1 mg mL−1) for 10 min, washed with Milli-Q water for 1 min, impregnated with 70% ethanol, and air-dried. Fluorescent in situ hybridization (FISH) was done with probes (Integrated DNA Technologies, Coralville, IA, USA) EUB 338 [17] and ARCH 915 [18] labeled with Alexa Fluor® Dyes 488 and 546 (Invitrogen, San Diego, CA, USA) for Bacteria and Archaea, respectively. Dehydration with ethanol and in situ hybridization were conducted as described [19]. Cells were counted in duplicate using an epifluorescent microscope (Nikon Eclipse 50, Nikon, Melville, NY, USA) [20]. If more than 30 cells per field were observed, 20 microscope fields were counted; otherwise, 50 microscope fields were analyzed.

DNA Extraction

DNA was isolated as previously described [21] by filtering water (10 L) first through a 5.0-μm cellulose filter (Fisherbrand, Fisher, Houston, TX, USA) and then through a 0.22-μm polycarbonate filter (Millipore, Billerica, MA, USA). Cells on the filter were lysed by incubation at 37°C for 45 min in lysis buffer [lysozyme (1 mg mL−1), proteinase K (0.2 mg mL−1), and achromopeptidase (0.6 mg mL−1)]. Crude lysates were extracted twice with phenol/chloroform/isoamyl alcohol (25:24:1, pH 8.0). DNA was further cleaned using the UltraClean® GelSpin® DNA Purification Kit (MOBIO Laboratories, Inc., Carlsbad, CA, USA), quantified by NanoDrop (Thermo Scientific, Inc., Wilmington, DE, USA), and checked for quality by 1% agarose gel electrophoresis using SYBR® Safe staining (Invitrogen, San Diego, CA, USA). Images were digitized with the software Quantity One® v. 4.6.3 (BioRad Laboratories, Inc., Hercules, CA, USA). The DNA was stored at −20°C prior to amplification.

Clone Libraries

Bacterial primers 8F and 915R [22] and archaeal primers ARCH 109F and ARCH 915R [23] were used to amplify 16S rRNA genes (Table 1). Each 50 μL PCR reaction contained 2 μL (40 ng) DNA, 0.1 mM of each dNTP, 1.5 mM MgCl2, 1× PCR buffer (Invitrogen), 0.2 μM of each primer, and 0.5 U DNA Polymerase (Invitrogen). Amplification using bacterial primers was accomplished by denaturation at 94°C for 3 min, 25 cycles of 92°C for 40 s, 52°C for 30 s, and 72°C for 2 min, with a final extension at 72°C for 10 min. PCR conditions using archaeal primers were identical, except that the initial denaturation was done at 96°C for 3 min. Amplicons were purified using the UltraClean® PCR Clean-Up kit (MOBIO Laboratories Inc., Carlsbad, CA, USA) and cloned using the HTP TOPO TA kit (Invitrogen). Insert DNA was amplified using primers M13F and M13R (Table 1), and randomly selected clones were sequenced on both strands (Macrogen Inc., Seoul, South Korea). Sequences of insufficient length or quality were removed using the software CLC Workbench version 5.2 and checked using the Chimera Check tool available in Greengenes [24] (http://greengenes.lbl.gov/cgi-bin/nph-bel3_interface.cgi). These sequence data have been submitted to the GenBank database under accession numbers JF280147–JF280363 (archaeal sequences) and JF280364–JF280675 (bacterial sequences). Sequence alignment was carried out using Infernal [25], and classification was done with the Ribosomal Database Project (RDP) naive Bayesian classifier [26] (http://rdp.cme.msu.edu/classifier/classifier.jsp) using an 80% confidence threshold. The DOTUR software was used to assess microbial richness and sample coverage associated with each clone library [27]. Non-parametric richness (S CHAO and S ACE) and coverage (Good’s and C ACE) estimators are widely used to estimate the total number of operational taxonomic units (OTUs) and the proportion represented in a given sample to assess if sufficient work has been done to capture most of the diversity in a sampled environment [28]. Representatives of each OTU were analyzed with the GenBank database by using the basic local alignment search tool (BLAST) at the NCBI website.

Table 1 Primers used in this study

Pyrosequencing of V5–V6 Hypervariable Regions

Primers were designed based on previous reports [10] and modified to include a broader range of taxa by downloading and aligning 5,530 bacterial 16S rRNA sequences from the RDP (as of November 2008, in the range of 100–400 bp) and analyzing the V5–V6 region for sequence variation. Primers were designed according to full-alignment representation by placing 3′ degenerate nucleotides. Using the probe match tool (RDP 10.4), this primer set could theoretically anneal and amplify 99.96% of bacterial and 97.82% of archaeal sequences, allowing one mismatch, based on comparison against the entire database of sequences reported in the RDP database on October 11, 2008 (690,149 sequences). PCR amplifications were done as reported at the time [29] in a 25-μL reaction volume containing 2 μl (20 ng) DNA, 0.75 μM of each primer 807F and 1050R designed by us (Table 1), 2.5 U Pfu Turbo® DNA polymerase (Stratagene, Inc., La Jolla, CA, USA), 1× Pfu reaction buffer, 0.6 mM dNTPs, 5% v/v dimethyl sulfoxide using the following PCR conditions: 2 min at 95°C, 30 cycles consisting of denaturation for 30 s at 95°C, a temperature touch down from 60 to 51°C (2°C every six cycles), 72°C for 1 min and a final extension of 72°C for 5 min. The PCR product was cleaned using the QIAquick PCR Purification Kit (Qiagen N.V., Hilden, Germany) and used as template for a second PCR to add pyrosequencing adapter and barcode sequences (http://pyro.cme.msu.edu/) using primers 16S807F-b15 and 16S1050R-b5 (Table 1) and conditions identical to those of the first PCR, except for the number of cycles (five) and the annealing temperature (53°C). PCR products were assessed by 1% agarose gel electrophoresis stained with ethidium bromide. Pyrosequencing was carried out from the reverse primer (Engencore, University of South Carolina, Columbia, SC, USA). These sequence data are available at the NCBI Short-Read Archive (http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies) under accession number SRA029240.1. Sequences were cleaned using tools implemented in the GeBiX diversity portal (www.gebix.org.co/gbx_diversity), taking care to minimize the effects of sequencing errors, as reported [24, 30, 31]. Sequences were evaluated based on quality scores, eliminating those that did not match perfectly the primer and barcode at the beginning of the read or that contained more than one undetermined nucleotide (N). Sequences were then trimmed using a 40-nucleotide window for analysis, shifted one nucleotide at a time, and required 90% of the bases to surpass the established threshold of 20 for the window to be considered of good quality. The size of the sequences was restricted to a minimum of 50 bases, after eliminating the first 29 bases corresponding to the tag (two bases), barcode (eight bases), and primer (19 bases). Complementary reverse sequences obtained for each clean read were aligned using Infernal v 1.0.2 [25]. Dnadist PHYLIP version 3.6 was used to generate pairwise distance matrices [32], and diversity and richness indices were obtained using DOTUR with the furthest neighbor algorithm [27]. Sequences were assigned to bacterial phyla and families with the RDP-naive Bayesian classifier, using an 80% confidence threshold.

16S rRNA Gene Microarray

We used the PhyloChip (Affymetrix Inc., Santa Clara, CA, USA), as described previously using universal primers 27F and 1492R (Table 1) [5, 33], by pooling eight PCR reactions, each carried out with 10 ng of DNA per PCR. Briefly, full-length 16S rRNA gene PCR was fragmented, biotin-labeled, and hybridized to the array. The microarray was scanned and recorded as a pixel image, and initial data acquisition and intensity determination were performed using custom Affymetrix software. Data analysis was performed as described [33]. In order to assign the same taxonomy to 16S rRNA gene sequences, the reference Greengenes sequence (rep-prokMSA-id) for each OTU detected on the microarray was classified using the online RDP II classifier with an 80% confidence threshold. Array taxa were analyzed for the presence of 454 primer sequences using 78 sequences belonging to orders detected only with the PhyloChip (two or more sequences for each order). Sequences were aligned using Muscle 3.6 with the parameter, −maxiters 2 [34], and aligned sequences were manually inspected for the presence of the 454 primers used to amplify the V5–V6 regions.

Phylogenetic Reconstruction

Phylogenetic reconstruction was made using the ARB 5.1 software [35] equipped with SILVA 100 database [36], with the representative sequences of each OTU obtained at 97% identity for both clone libraries and 454 pyrotags and the reference sequences from Greengenes for each OTU in the microarray. Sequences were aligned using the SILVA aligner tool (http://beta.arb-silva.de), imported into ARB, and inserted into the existing reference neighbor-joining tree using the parsimony insertion tool [35].

Results

Site Description and Physicochemical Features

Water at the hot spring El Coquito emerges at 3,973 m above sea level and becomes a flowing 2–3 m wide, shallow stream (30 cm) surrounded by the plant Calamagrostis effusa and different species of moss; the area in general is dominated by tussock grasses and Espeletia spp. Chemical analysis of the sample indicated that SO 2−4 (1,003 mg SO 2−4  L−1) and Ca2+ (320 mg L−1) were the most abundant ions, followed by Mg2+ (55.3 mg L−1), Na+ (45.2 mg L−1), K+ (9.25 mg L−1), and total Fe (Fe2+ and Fe3+) (8.27 mg L−1). The water had a pH of 2.7 and temperature of 29°C at the source, while the ambient air temperature was 9°C. DAPI staining to enumerate total counts revealed 2.35 (±0.07) × 105 cells/mL, and phase contrast microscopy showed the presence of bacilli with predominance of coccoid bacteria and some bacilli. FISH analysis revealed a community dominated by Bacteria (4.32 × 103 cells/mL) with a smaller number of Archaea (1.3 × 103 cells/mL).

Microbial Composition Based on Clone Libraries

From a total of 351 non-archaeal and 297 archaeal 16S rRNA gene sequences obtained from PCR clone libraries, those of insufficient quality or that did not pass the chimera check analysis were removed, leaving 315 and 247 non-archaeal and archaeal sequences, respectively. More OTUs, defined using a distance of 3%, were found for non-Archaea than for Archaea and the estimated coverage for each of the libraries indicated good representation, with slightly better coverage for the Archaea (Table 2). The non-parametric richness estimators S CHAO1 and S ACE gave similar values and indicated greater richness than observed (number of OTUs) (Table 2). A rarefaction analysis for archaeal (Fig. 1a) and bacterial (Fig. 1b) sequences showed that at 3% distance more sampling might still be required to cover the prokaryotic diversity in this sample.

Table 2 Diversity analysis of 16S rRNA sequences from El Coquito hot spring
Figure 1
figure 1

Rarefaction analysis. Rarefaction curves were constructed using DOTUR software for all (including chloroplast sequences) a archaeal and b bacterial clones libraries and c 454-pyrosequencing data. The sequence identity clusters are shown as unique, 0.01, 0.03 and 0.05

Out of the total number of 315 sequences obtained, 143 (45%) were affiliated to chloroplasts (Eukaryotes) and were not analyzed in this study, leaving 172 and 247 sequences belonging to the Bacteria and Archaea, respectively. The predominant phylum present in the bacterial library was Proteobacteria (69%), followed by Actinobacteria and Nitrospira (Supplementary Table 1, Fig. 2). Betaproteobacteria was the dominant class among the Proteobacteria, although almost half of these sequences could not be further classified. The most abundant order was Burkholderiales, represented by sequences similar to Thiomonascuprica (97% identity), based on BLAST analysis (data not shown). Gammaproteobacteria was the next most abundant group, with sequences related to Legionella (98% identity), the iron oxidizer Acidithiobacillus ferroxidans (99% identity), and uncultured gammaproteobacterial clones (96–98% identity) from acid impact lakes [37]. The next largest group was Alphaproteobacteria that included the orders Rickettsiales and Rhodospirillales, with sequences affiliated to the acidophilic heterotrophic bacteria of the genera Acidisphaera, Acidiphilium, and Acidocella (96–99% identity). Within the phyla Actinobacteria and Nitrospira, clones were affiliated to the iron-oxidizing mixotrophic acidophilic genus Ferrithrix [38] and to Leptospirillum ferriphilum (99% identity) that can use ferrous iron or pyrite as energy sources [39]. Phyla recovered in smaller numbers included Planctomycetes, Firmicutes, Spirochaetes, Acidobacteria, and Bacteroidetes (Supplementary Table 1). Six of the Planctomycetes clones had 90–100% identity to Zavarzinella, an aerobic bacterium isolated recently from an acidic Sphagnum peat bog [40], and two of the Acidobacteria clones had 100% identity with sequences obtained from metal-enriched environments [41, 42].

Figure 2
figure 2

Community structure of El Coquito hot spring. Relative abundance of clone libraries, 454 pyrosequencing and Phylochip. The classification is shown at the order level and was based on the RDP classifier. ‘Other’ includes groups that totaled <2.5% at the phylum level

The majority of the archaeal sequences (65%) could not be classified (Supplementary Table 1). The rest of the sequences belonged to the phyla Crenarchaeota (5%), specifically to the class Thermoprotei that contains several thermoacidophilic microorganisms, and Euryarchaeota (30%) that includes some of the most acidophilic microorganisms known [43, 44]. In general, the classified archaeal sequences were most similar to sequences obtained from acidic hot springs, acid mine drainage, or environments with high concentrations of heavy metals [4547].

Microbial Composition Based on Pyrosequencing of the V5–V6 Region

From an initial 5,439 sequence reads, 5,095 sequences with an average length of 186 bases were retained for analysis after cleaning. At 3% distance, the diversity indices indicated a high microbial diversity (Table 2), and rarefaction analysis showed that more sequences would still be required to cover the diversity in this sample (Fig. 1c). Consistent with previous reports [10, 48], there was a large number of rare OTUs represented by single sequences (33%) and the predominance of few sequences. There were also a few predominant OTUs, the most abundant one affiliated to an unclassified bacterium represented by 12% of the sequences. Only 23% of the OTUs were represented by more than ten sequences, and the remaining 44% of the OTUs contained between two and ten sequences. Among these sequences, 293 (6%) were assigned to chloroplasts (Eukaryotes) and not further analyzed here. The discrepancy between the number of chloroplast sequences identified by pyrosequencing and clone libraries could be due to the difference in strategies and the ability to capture different populations with the primers used.

Of the remaining 4,800 sequences, 4,776 were classified as Bacteria (99.5%), and only 24 sequences were classified as Archaea (0.5%). Approximately half of both archaeal and bacterial sequences could not be classified further (Supplementary Table 1). This low number of Archaea contrasts with results for clone libraries and could be due to differences in PCR conditions and the fact that amplification for pyrosequencing was done in a single reaction and with primers designed for coverage of >99% bacterial and archaeal sequences (see methods), whereas separate reactions were done for clone libraries. Thus, clone libraries give information on the phylotypes present, but not necessarily about the relative proportion of Archaea and Bacteria in the community, especially when more sampling is required as indicated by rarefaction.

The predominant bacterial phylum was Proteobacteria (42.5% of all sequences) with affiliations, in decreasing order, to the Beta-, Gamma-, and Alphaproteobacteria (Fig. 2). The predominant genera were Thiomonas (Betaproteobacteria), Aquicella, and Legionella (Gammaproteobacteria). Other abundant phyla in the sample were candidate division TM7 (5.5%) and Planctomycetes (2.4%). All remaining bacterial phyla identified represented <2% of the sequences (Supplementary Table 1). A small number of sequences (0.08%) were not classified as either Bacteria or Archaea (unclassified).

Phylogenetic Profile Obtained with the Microarray

Due to the small amount of DNA available, amplification was done only with Bacteria-specific primers. Of the 8,434 bacterial taxa represented on the array, 366 were identified in our sample, with OTU hybridization intensities ranging between 0.02% and 0.86% of the total signal intensity detected. To compare these results with those obtained with the other two strategies, the reference sequences corresponding to positive signals in the microarray were classified using the RDP classifier. The relative abundance at each level (phylum, class, or order) was calculated by adding the hybridization intensities of the corresponding OTUs identified by the gene array. The array detected 22 different phyla that covered a wide range of abundances from the dominant Proteobacteria (55%) to groups present in low numbers such as the Chlamydiae (0.1%) and Tenericutes (0.08%). The most abundant orders were the Burkholderiales (Betaproteobacteria) (13.8%) and Campylobacterales (Epsilonproteobacteria) (9.1%), followed by the Actinomycetales (7.6%), Clostridiales (7.2%), and Rhizobiales (5.4%) (Fig. 2). Some of the genera detected are reported to be iron-oxidizing bacteria, such as Thiobacillus and Acidimicrobium, while others are sulfur-oxidizers (Thiomonas) or sulfate reducers (Desulfovibrio and Desulfomicrobium) [4953]. Some taxa, such as candidate divisions OP10 and BRC1, were identified only with the 16S rRNA gene array and at low abundances (Supplementary Table 1).

Comparative Analysis of the Prokaryotic Community

Despite the methodological differences inherent to each strategy used, a comparison of the results can nevertheless provide valuable information in complex environmental samples [48, 11]. This comparison showed that 12 out of a total of 25 bacterial phyla were shared by at least two of the strategies used (Supplementary Table 1). Despite the fact that the microarray detects only known taxa, it detected more unique groups, such as the phyla Aquificae and Deinococcus-Thermus. Some phyla were detected only by the other strategies, such as Spirochaetes and candidate divisions Ktedonobacteria (found only in gene libraries) and OD1 (only by 454 pyrosequencing). To more easily see the overlap and differences among strategies, a Venn diagram was constructed based on the orders detected (Fig. 3). Proteobacterial sequences were the most abundant by all strategies, with the Betaproteobacteria, and specifically the order Burkholderiales being predominant (Fig. 3). Although differences in abundance were also seen for each technique, as was the case for the Deltaproteobacteria and the Gammaproteobacteria, it is difficult to assess the real significance of this variation in the absence of technical replicates.

Figure 3
figure 3

Venn diagram at the order level. Classification at the rank of order shows shared and unique taxa (numbers in parenthesis inside circles) identified with each strategy, as well as the total number identified for each method (in parenthesis outside circles). Letters c, p, and m indicate the five most abundant taxa identified by each strategy: c for clone libraries, p for pyrotags, and m for microarray

Greater differences were evident at other taxonomic levels. A total of 56 bacterial and archaeal orders were found overall, most of which belonged to the Proteobacteria (Fig. 3). The microarray alone detected 27 orders not detected by the other strategies, while only three and four orders were identified only by 454 pyrotags and libraries, respectively. Many of the orders shared by at least two strategies included those that were most abundant either by 454 pyrotags or by gene libraries (Fig. 3). Some groups, such as the Rhodocyclales, were identified by all strategies even though they were not necessarily the most abundant. Other shared taxa were abundant only in two of the three methods, such as Legionellales and Rhodospirillales (gene libraries and 454 pyrotags) and Clostridiales (PhyloChip and 454 pyrotags) (Fig. 3). Some highly abundant organisms found by one strategy, such as Acidimicrobiales in gene libraries, were less prominent in the microarray data and not observed at all with 454 pyrotags (Fig. 3). Thus, it is evident that each of the three strategies showed differences in terms of detecting certain groups and providing information regarding their abundance.

One possible explanation for the overall greater detection obtained with the microarray could be that the primers used for amplification for 454 pyrosequencing might not pick up these sequences. To test this, the PhyloChip sequences (two or more sequences for each order) were analyzed in silico for the presence of the V5–V6 amplification primers. In all cases, there was a perfect match, except for four sequences with <79% match. This indicated that the primers used here should be able to amplify those sequences observed with the microarray. As expected, only few of the total microarray sequences were unclassified (4%) given that it detects only known taxa, while larger proportions of both 454 pyrotags (46%) and clone libraries sequences (65% and 6.4% for Archaea and Bacteria, respectively) could not be classified (Supplementary Table 1).

In order to see how these sequences were distributed, sequences representative of each OTU were inserted into the neighbor joining phylogenetic tree in ARB (Supplementary Fig. 1). Again, the microarray sequences represented more taxa, and similarities were evident at higher phylogenetic levels (phylum and class). Clusters of sequences obtained by a single strategy were also observed within the Actinobacteria, Verrucomicrobia, and Synergistetes. Interestingly, some of the previously unclassified 454 pyrotags sequences were placed within clusters of known taxa, such as the phyla Crenarchaeota, Planctomycetes, and Nitrospira (Supplementary Fig. 1). Some groups like Nitrospira and Planctomycetes contained many of these unclassified pyrosequences, which could indicate great sequence diversity and few reported sequences similar to the ones found in this ecosystem. A closer examination of these unclassified pyrosequences showed that using the RDP classifier at a confidence threshold lower than the one used previously (50–79% instead of 80%) resulted in a taxonomic placement consistent with the phylogenetic reconstruction. Using a confidence threshold of 50% to classify 454 reads, the overall results were similar, although six new orders were detected for 454 pyrosequences, including two detected previously only by microarray and clones, and taxa belonging to the Planctomycetes, Nitrospira, and Crenarchaeota (data not shown). Additionally, a distinct, deep-branching clade was also observed (Supplementary Fig. 1), which, however, should be further analyzed and verified using near-full length sequences.

Discussion

Microbial Community Structure

El Coquito is an acidic thermal spring with a high sulfate content, which makes it similar to other thermal springs located in this Andean volcanic belt [2]. Although the water temperature is not very high (29°C), it contrasts greatly with the ambient temperature, which can oscillate between−4°C to 50°C in 1 day. The microbial community is diverse and dominated by Bacteria rather than Archaea, even though Archaea tend to predominate in more extreme environmental conditions [54]. Although the time taken prior to processing samples and isolating DNA could generate biases, this result was also consistent with FISH analysis. The discrepancy between DAPI counts and the much lower counts obtained using FISH probes could be due to low coverage of FISH probes, since only one EUB338 probe was used and, therefore, much of the diversity present could have been missed [55]. It could also be due to the presence of viruses, which could have been included in DAPI counts or of eukaryotic micro-algae in the sample. It is also possible that a portion of this microbial community was not targeted with the primers used in the three molecular approaches. However, the large proportion of unclassified sequences from Archaea in clone libraries (65%) and Bacteria using 454 pyrotags (46%) indicates that many novel microorganisms were in fact detected that are different from those previously identified in other acidic environments and thermal springs [4547, 50]. This could be due to differences in physicochemical parameters and geographical isolation of this spring. Future work will involve comparison of this community with those present in other hot springs in this area.

The prokaryotic community is composed predominantly of Proteobacteria, consistent with the assessment of mesophilic acidophilic microbial communities (temperature for growth <40°C) [56]. The dominance of the Betaproteobacteria, a group containing microbes with a broad distribution of functions that can be considered as ecological generalists [37, 57], is similar to results from other environmental surveys [58, 59]. In general, the community is reminiscent of those found in hot and acidic environments with mesophilic organisms (Acidithiobacillus, Leptospirillum, Thiomonas, Acidocella, Acidisphaera, and Epsilonproteobacteria) as well as thermophilic microorganisms (Acidiphilium, Acidithiobacillus, Leptospirillum, Acidocella, and Acidisphaera) that are indicative of microhabitats with different temperature gradients within the hot spring. The presence of generalists that can grow under different environmental conditions and use diverse carbon and energy sources, together with specialists (e.g., sulfur and iron oxidizers and sulfate reducers), is also indicative of metabolic diversity. The high abundance of the sulfur-oxidizing bacteria Thiomonas and Acidithiobacillus that can oxidize reduced sulfur to sulfuric acid could contribute to the extreme conditions in this hot spring by dramatically lowering the pH [51, 56]. Thus, the presence of these microorganisms, together with the high concentrations of sulfate and iron present in this hot spring, are suggestive of microbial activity associated with the cycling of ferrous and sulfur-containing minerals.

There were few phototrophic bacteria (Chlorobi, Cyanobacteria, and Chloroflexi) in this spring, in contrast to other hot spring communities [49, 52, 60]. Despite receiving high levels of solar energy at this high elevation, this is consistent with the fact that water emerges from underground and with the notion that Cyanobacteria do not grow well at acidic pHs and are more sensitive to metals and solutes found in acidic waters [50, 61]. In this location, the eukaryotic microalgae could be driving primary production using solar energy at the surface, similar to what occurs in surface acid streamers and other acidic extreme environments [38, 56]. Subsequent studies will include analysis of chloroplast 16S rRNA gene sequences recovered here and 18S rRNA gene sequences to further analyze the eukaryotic community. The high abundance of chemolithoautotrophic acidophiles (Leptospirillum, Acidithiobacillus, Thiobacillus, Thiomonas, and Aquicella) also indicates that there is primary production driven by chemical energy. There are also heterotrophic acidophilic bacteria such as Acidiphilium, Acidisphaera, Acidocella, and Alicyclobacillus. Thus, it appears that in this community, primary production can be driven by both solar energy at the surface and by inorganic chemicals that affect the biogeochemistry of iron and sulfur in the water.

Complementing Culture-Independent Approaches

The three strategies used to analyze the diversity of this ecosystem show dominance of Proteobacteria, and specifically of Burkholderiales (Betaproteobacteria), with organisms closely related to Thiomonas and Thiobacillus. The large number of sequences obtained from both gene libraries and 454 pyrotags that could not be classified also suggests novel sequences and not merely overestimation of phylotypes or sequence errors, as has been reported for both strategies, and in particular for pyrosequencing [10, 62]. In fact, the pipeline for cleaning and analyzing 454 sequences takes into account previous reports to eliminate possible sequencing errors and overestimation of OTUs [63, 64]. This also highlights the importance of amplifying and sequencing 16S rRNA genes directly from an environmental sample.

Differences among the three approaches used are also evident. The microarray alone detected approximately one third of the total phyla, consistent with previous studies comparing it with clone libraries [4, 6, 8, 65]. It was surprising, however, that it detected more groups than 454 pyrosequencing, which can provide great depth of coverage. Our results could have been affected by the relatively modest number of pyrosequences and the high number of unclassified sequences, which could hamper detection of low abundance community members. It has recently been shown that low abundance taxa are harder to classify because they are infrequent and tend to be less represented in databases [66, 67]. In addition, the use of different sets of primers and amplification conditions can result in different PCR pools. This might explain why some taxa were not detected with pyrosequencing despite a bioinformatics analysis, indicating that amplification should be possible with the primer set used. It could also account for the similar estimated coverage levels obtained for both clone libraries and 454 pyrotags, despite differences in OTUs detected. Coverage of diversity and accurate taxonomic assignment can also be more sensitive to the region of the 16S rRNA gene being sequenced than to the fragment size [6870]. Despite differences, however, several primer sets have been shown to give stable estimates of abundances and consistent taxonomic assignation [69]. In our case, the short size of the pyrosequence reads obtained (average of 186 nt) could have affected the accuracy of taxonomic classification and resulted in higher richness estimates at the OTU level [71]. However, the RDP II classifier, one of the choice algorithms for classification of short reads that produces highly stable and accurate results even for fragments of disparate sizes [67, 69], gave results consistent with the phylogenetic reconstruction of all sequences and even improved when a lower confidence threshold was used. This suggests that the original analysis of these hypervariable regions was quite strict and that a lower threshold for classification might be a useful alternative for assignment of reads in datasets with many novel and short sequence reads that are difficult to classify, as has been suggested [11].

Despite these differences, array and pyrotag data are similar in terms of the dominant groups obtained and the community profiles at the phylum and class levels, consistent with what has been reported using the HITChip array and pyrosequencing for V4 and V6 regions of microbial communities in human distal intestine samples [11]. In our case, this correlation is not observed at the OTU level where the most abundant OTU by 454 pyrosequencing represented 12% of the total sequences, while the maximum relative intensity obtained with the microarray was only 0.86% of the total. This could be explained by differences in the relative proportions of probes on the microarray for some groups [65] or by cross-hybridization of array probes with many of the novel, unclassified sequences in the dataset [72, 73]. The outcome of hybridization against these unclassified sequences in our dataset could not be further analyzed, however, in the absence of probe sequence information. Amplification bias could also be influencing the results since there was little DNA available for the PhyloChip, which prevented us from doing duplicate analyses to test reproducibility of our results. Running replicate analyses, which have been shown previously to be reproducible, represent one of the strengths of using high-density arrays [5, 7]. In addition, both the primer sets and the number of PCR cycles varied for each approach. These methodological differences can therefore affect the populations analyzed in each case, as was evident for the Archaea and chloroplasts sequences.

Although a comprehensive comparison of the three methodologies is beyond the scope of this study, it is evident that the combination of techniques improved detection of community members. These strategies share common steps in sample preparation and also involve methodological differences that can lead to differences in estimations of microbial community diversity and structure [15]. While 16S rRNA gene clone libraries analyze almost full-length genes, 454 pyrosequencing and microarrays provide greater depth of coverage but are limited in turn by the short length of sequence reads and identification of known taxa, respectively. Importantly, these results strongly suggest that no single methodology of community diversity assessment is completely reliable and that combination approaches should be followed.