Introduction

Yellowstone National Park (YNP) is one of the largest and most diverse hydrothermal areas on Earth, and it harbors more than 12,000 thermal springs that are characterized by a broad range of temperature (40–92 °C), pH (1–10), and geochemical properties (Fournier 1989; Rye and Truesdell 2007). YNP thermal springs often have abundant and diverse electron donors (e.g., H2, sulfide, S0, thiosulfate, and Fe2+) and electron acceptors (e.g., dissolved O2, S0 and Fe3+), which provide an abundance of potential niches. As a consequence, thermal springs support microbial communities that comprise a diverse array of metabolisms, including photoautotrophs, photoheterotrophs, chemolithoautotrophs, and chemoorganotrophs (Amend and Shock 2001).

pH is a primary environmental factor that directly influences microbial community composition in thermal springs at the regional and global scales (Boyd et al. 2010; Boyd et al. 2013; Dequiedt et al. 2009; Inskeep et al. 2013b; Song et al. 2013; Xie et al. 2015). While the range of pH in YNP thermal springs is broad (1–10), the majority of thermal springs in YNP can be classified into two categories by pH—acidic, vapor-dominated systems, and circumneutral to alkaline, water-dominated systems (Fournier 1989). The vapor-dominated springs, which often discharge relatively little liquid water, contain H2S that oxidizes to H2SO4 when it contacts air in perched pools of ground water. In contrast, the water-dominated systems discharge significant amounts of circumneutral or alkaline water enriched in chloride (Fournier 1989). Notably, there are few thermal features in YNP with intermediate pH in the range of 4–5. For example, of the more than 7000 thermal features inventoried by the US National Park Service (data available online at http://www.rcn.montana.edu/Default.aspx), only ~5% of the entries have a pH between 4 and 5. Microbial communities in the two pH systems common in YNP (acidic and circumneutral to alkaline) have extensively been studied and are known to each harbor distinct communities. Circumneutral to alkaline springs often include microbial communities that are dominated by members of the Aquificales, Chloroflexi, and Cyanobacteria (Inskeep et al. 2010, 2013b; Madigan 2003; Meyer-Dombard et al. 2005; Meyer-Dombard et al. 2011; Reysenbach et al. 1994, 2000; Spear et al. 2005; Ward et al. 1998b). Although the Archaea are present in circumneutral springs, they are estimated to be a lesser fraction of the total biomass (Inskeep et al. 2013b; Miller et al. 2009; Ward et al. 1998a). Conversely, Archaea predominate the microbial communities in acidic, vapor-dominated springs, particularly with elevated temperatures (Inskeep et al. 2013a), and often include members of the Crenarchaeota, Euryarchaeota, and Korarchaeota (Brock et al. 1972; Inskeep et al. 2013a; Jackson et al. 2001; Meyer-Dombard et al. 2005). A bimodal pH distribution among terrestrial thermal springs has been noted for thermal areas worldwide as well (Brock 1971). As a result, the previous surveys of microbial communities from other geothermal hotspots around the world, including El Tatio, Chile (Engel et al. 2013), Kamojang, Indonesia (Aditiawati et al. 2009), Nakabusa, Japan (Everroad et al. 2012), Odisha, India (Sen and Maiti 2014), and Tibet, China (Song et al. 2013; Wang et al. 2013), mainly focused on acidic springs with pH below 3 or circumneutral to alkaline springs with pH above 6. A recent metagenomic investigation of two intermediate-pH (e.g., ~4), high-temperature (e.g., >55 °C) sites found taxonomic profiles similar to those from acidic springs (Inskeep et al. 2013b). To date, no effort has been made to survey the microbial inhabitants and their functional roles in low-temperature (i.e., 45–55 °C), intermediate-pH (e.g., ~4) thermal springs (Inskeep et al. 2013b). Thus, little is known about the microbial ecology of intermediate-pH springs in YNP.

The goals of this study were to: (1) investigate taxonomic profiles of four YNP springs within an intermediate-pH range (4.05–4.35) using 16S rRNA gene amplicon pyrosequencing and (2) characterize the metabolic potential of one of these sites, a low-temperature (55 °C) thermal spring containing a distinct microbial community. To our knowledge, this study is the first survey of microbial taxonomic and functional diversity of pH 4 springs using a combined 16S rRNA gene amplicon and metagenomic sequencing approach.

Materials and methods

Site description and sample collection

Four geothermal springs were selected for field measurements and sample collection in the following YNP thermal areas (sample names are given in parentheses): Norris (NOR), Mary Bay Area (MRY), Mud Kettles (MKL), and Seven Mile Hole (SMH). Locations, descriptions, and photos are listed in Table 1 and Fig. 1. Water samples for geochemical analysis were collected at each site in conjunction with the biomass samples for sequencing analysis. Water samples were collected from the overflow channels of spring sources as proximate to the center of flow as possible, where the water was well mixed. Water for geochemical analysis was filtered through a 0.2 µm Sterivex filter using sterile 50 mL syringes and preserved as appropriate for the analysis to be performed (McCleskey et al. 2005). Briefly, syringes were rinsed three times with site water before collection and all samples were collected into acid pre-washed polyethylene bottles (soaked in 5% HCl for 3 h and rinsed three times with deionized water), except the anion samples, which were collected into deionized water pre-washed polyethylene bottles (soaked for 3 h and rinsed three times with deionized water). Water (30 mL) for cation analysis was preserved by acidification using 0.3 mL 3 N nitric acid, whereas water (125 mL) for anion analysis did not receive any protective reagents. Water (100 mL) for As and Fe species was collected into opaque polyethylene bottles and preserved with 1 mL of 6 N HCl, and water (30 mL) for the ammonium analysis was preserved with the addition of 0.3 mL of 4.5 N H2SO4. For the SiO2 analysis, 1 mL of water was immediately diluted by 9 mL deionized water to prevent precipitation. For the sulfate analysis, 30 mL of water was preserved with 0.5 mL of ZnCl2, followed by 0.5 mL of NaOH. Samples for the determination of dissolved organic carbon (DOC, 100 mL) were collected into heat-combusted (475 °C for 4 h) amber glass bottles and preserved with 1 ml 6 N HCl. All water samples were transported and stored at 4 °C until analysis which was no more than 2 weeks after sample collection.

Table 1 Geographic and geochemical parameters for the four sampling sites in YNP
Fig. 1
figure 1

Geographic map and photos showing sampling locations

Sediment, mat, or filament samples were collected aseptically into 2 mL microcentrifuge tubes and preserved in sucrose lysis buffer (SLB; 20 mM EDTA, 200 mM NaCl, 0.75 M sucrose, 50 mM Tris–HCl, pH 9.0). Sediment samples were collected from the top 3 cm of the spring bed, and biofilm samples (mats or filaments) were collected right below the air–water interface (<1 cm). Samples were stored at ambient temperature (~10–26 °C) for up to 10 days before they were stored at −80 °C. The previous experiments indicated that storage of samples in SLB without freezing did not lead to a loss of DNA or microbial diversity relative to samples immediately frozen in liquid nitrogen (Mitchell and Takacs-Vesbach 2008). Once in the laboratory, samples were stored at −80 °C until DNA extraction.

Geochemical analyses

At each sampling location, temperature and pH were measured using a Thermo Orion 290A+ meter, and electrical conductivity was measured with a WTW meter with temperature correction. Dissolved H2S was measured in the field using a portable colorimeter (Hach DR/850) by the methylene blue method (APHA 1985). Because dissolved H2S is volatile and oxidized quickly, spring water was directly drawn into a plastic syringe and filtered through a 0.2 µm filter into a measuring cuvette. Methylene blue reagents were added immediately and the absorbance and temperature of the solution were measured after color development. The temperature dependence of the H2S-methylene blue color complex was corrected using the method detailed in McCleskey et al. (2005). DOC concentrations were measured using the wet oxidation method (Aiken 1992) with a TOC Analyzer (Oceanography International Model 700). Major anions were measured using ion chromatography (IC), and cations and trace metals were measured using inductively coupled plasma-optical emission spectrometry (ICP-OES). All geochemical analyses, including anions and cations, were conducted using standard USGS methods, and typical measurement uncertainties were <5% (McCleskey et al. 2005). Major ion composition of these four sites was compared with 97 other YNP inventory sites that were sampled and analyzed using the same methods (data available online at http://www.vesbachlab.org/data.html) as part of a larger microbial inventory conducted in YNP.

DNA extraction

DNA was extracted from 0.2 g of each sample following bead-beating in a CTAB buffer (1% CTAB, 0.75 M NaCl, 50 mM Tris pH 8, 10 mM EDTA) and subsequent phenol–chloroform purification steps as described in (Mitchell and Takacs-Vesbach 2008). Briefly, 2 volumes of 1% CTAB buffer and proteinase K (final concentration 100 µg mL−1) were added to the samples, which were then incubated for 1 h at 60 °C. SDS (final concentration 2%) and 0.1 mm diameter Zirconia/Silica beads were added. Samples were bead beaten for 45 s at 50 strokes per second. After incubating for 1 h at 60 °C, DNA was extracted once with an equal volume phenol:chloroform:isoamyl alcohol (25:24:1), followed by two extractions with an equal volume of chloroform. Finally, the DNA was precipitated with two volumes of 95% ethanol, washed with 70% ethanol, dried by speed-vac, and reconstituted with 50 µL of filter-sterilized, autoclaved 10 mM Tris pH 8.0. DNA extracts were quantified using a Nanodrop ND-2000c spectrophotometer.

16S rRNA gene pyrosequencing

Barcoded amplicon pyrosequencing of 16S rRNA genes was performed as described previously (Van Horn et al. 2013). Briefly, DNA isolated from each sample was amplified using the universal bacterial primers 28F (5′-GAGTTTGATCNTGGCTCAG-3′) and 519R (5′-GTNTTACNGCGGCKGCTG-3′), and archaeal primers Arch349F (5′-GYGCASCAGKCGMGAAW-3′) and Arch806R (5′-GGACTACVSGGGTATCTAAT-3′) targeting the 16S rRNA genes as described previously (Colman et al. 2015; Rhoads et al. 2012). PCR was performed as follows: an initial cycle of 95 °C for 5 min, followed by 30 cycles of 95 °C for 30 s, 54 °C for 40 s, and 72 °C for 1 min, and a final elongation step for 10 min at 72 °C. Successful amplification was confirmed by agarose gel electrophoresis. Triplicate PCR mixtures per sample were combined and subsequently purified with an UltraClean™ GelSpin™ DNA Extraction Kit (MoBio Laboratories, Carlsbad, CA, USA). The purified DNA was quantified using a Nanodrop ND-2000c spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). Amplicons from all samples with different barcodes were pooled at equimolar concentrations for pyrosequencing on a 454 GS FLX (454 Life Sciences, Branford, CT, USA) using Titanium reagents according to the manufacturer’s protocol.

16S rRNA gene pyrosequencing data processing

Raw sequences obtained from pyrosequencing were denoised to correct for sequencing errors and remove low-quality sequences and potential sequencing chimeras using AmpliconNoise (Quince et al. 2011) integrated into QIIME (Ver. 1.8.0, Caporaso et al. 2010b). Adapters, multiplex identifiers, and primers were trimmed from denoised data. Operational taxonomic units (OTUs) were identified at the 97% DNA similarity level using UCLUST (Edgar 2010) in QIIME. The most abundant sequence from each OTU was picked as a representative sequence and aligned using the PyNAST aligner (Caporaso et al. 2010a) and the Greengenes database (GG 13_5, DeSantis et al. 2006). Taxonomic assignments were made using the Ribosomal Database Classifier program (Wang et al. 2007). Good’s coverage estimates for the data sets were performed with randomly drawn subsets of 800 sequences per sample to standardize for varying sequencing efforts across samples.

Shotgun metagenome sequencing

The Seven Mile Hole (04YSMH020, SMH, Table 1) sample was selected for further characterization by metagenomic sequencing, because it contained a relatively unique microbial community compared with the other samples described here, and the previous reports of pH 4 sites were focused on sites with much higher temperatures (>55 °C, Inskeep et al. 2013b). Approximately 500 ng of SMH DNA was used for library construction. Metagenome library preparation and sequencing were performed on one-half of a picotiter plate according to manufacturer’s protocol on a 454 GS FLX Titanium platform (454 Life Sciences, Branford, CT, USA).

COG function enrichment analysis

Metagenomic sequencing reads were quality-filtered and assembled using Newbler 2.6 (Margulies et al. 2005) using default settings. Contigs and singleton reads were submitted to the JGI IMG/M annotation pipeline (Markowitz et al. 2008) and annotated using the Clusters of Orthologous Group (COG, Tatusov et al. 2000) database.

To provide an assessment of the microbial community type of this site, COG functions from the SMH metagenome assembly were compared with functions from other published YNP metagenomes (Table S1, Inskeep et al. 2013b). For each metagenome, data were normalized by the total number of COG functions detected and weighted by contig depth if assembly information was available. For unassembled singleton reads, a contig depth of one was assumed. COG functions were classified into COG categories on IMG/M and a Bray–Curtis dissimilarity matrix based on the COG category abundance table was subsequently constructed. Principal Coordinates Analysis (PCoA) was performed with COG categories and a two-way hierarchical clustering was also done on COG category abundance to confirm the grouping pattern observed from the PCoA analysis. All multivariate comparisons and ordinations were performed using the R (Team 2011) statistical software with the ‘vegan’ (Oksanen et al. 2012) and ‘cluster’ (Maechler et al. 2012) packages.

The “Function Comparison” tool on the IMG/M server was used to determine which COG functions were statistically overrepresented in the SMH data set compared with other publically available YNP metagenomic data sets that were most similar to SMH (7 phototrophic samples identified by the PCoA and cluster analyses described above). The relative abundances of COG functions were calculated based on normalized gene counts and expressed as d scores (Markowitz et al. 2008). d score is equivalent to the standard variation from the null hypothesis (i.e., relative gene counts in metagenome A = relative gene counts in metagenome B). For each comparison, the P value cutoff for statistical significant d scores was assessed using a false discovery rate of 0.05.

Community composition and metabolic mapping

Unassembled raw reads were also annotated on the metagenomic analysis server, MG-RAST (Meta Genome Rapid Annotation using Subsystem Technology, v3.3 Glass et al. 2010), using the default quality control pipeline. Microbial composition and functional analyses were conducted via the MG-RAST best-hit classification tool against the GenBank (NCBI-nr), M5NR (M5 non-redundant protein), and RefSeq (NCBI Reference Sequences) databases using a minimum identity of 60%, e-value cutoff of 10−5 and a minimum alignment length of 50 bp.

BLASTX results from the NCBI-nr database were imported into the MEtaGenome ANalyzer software (MEGAN v4.70.4, Huson et al. 2007). Taxonomic classifications were made using the least common ancestor (LCA) algorithm based on the top 10 BLAST alignments for each read, and metabolic pathways were classified by KEGG database (Ogata et al. 1999). The sequences in each pathway (oxidative phosphorylation, methane metabolism, nitrogen metabolism, carbon fixation pathways in prokaryotes, carbon fixation in photosynthetic organisms, sulfur metabolism, and photosynthesis) were given taxonomic assignments at the phylum level. The pathways involved in nitrogen metabolism and sulfur metabolism were mapped and were reconstructed using KEGG identifiers.

Sequence data submission

Raw 16S rRNA gene amplicon sequencing data reported here are available through the NCBI Sequence Read Archive. The individual sff files were assigned the accession numbers SRX1031281–SRX1031284 under Bioproject PRJNA284196. The SMH metagenome is publicly available on IMG/M (SMH: IMG submission ID 13526) and MG-RAST (SMH: ID 4523620.3).

Results

Geochemistry

The physical and geochemical parameters measured for the four springs studied here are reported in Table 1. The pH values of the springs we sampled ranged from 4.05 to 4.35, whereas temperature ranged from 55 to 84 °C. Site SMH was the site with the lowest temperature (55 °C), compared with the other three sampling sites, which had temperatures that were 72 °C or higher (Table 1). Chloride and sulfate concentrations ranged among the sites from 1.82 to 599 mg L−1 and 73.3 to 571 mg L−1, respectively (Table 1). Compared with other YNP springs (shown in Fig. 2), the concentration of carbonates and bicarbonates in all the pH 4 sites was low. Waters in MKL and MRY had the highest SO4 2−/Cl ratios, while NOR had the lowest SO4 2−/Cl ratio site (Fig. 2). SMH, the site selected for metagenomic analysis, had a moderate SO4 2−/Cl ratio (Fig. 2).

Fig. 2
figure 2

Piper diagram indicating major ion compositions of four sites in this study together with the other YNP inventory sites

Taxonomic profiles of 16S rRNA gene amplicon sequencing

A total of 6432 bacterial 16S rRNA gene sequences were obtained from the four pH 4 sites after denoising and removing low-quality or chimeric sequences. No archaeal 16S rRNA gene sequences were amplified with the Arch349F and Arch806R primers despite varying PCR conditions. Negative PCR results were confirmed by adding Archaeal genomic DNA to PCR reactions containing sample DNA, indicating that either there was no Archaea template in the sample DNA or our PCR primers did not target any Archaea that were present in these samples. The bacterial 16S rRNA gene sequences clustered into 226 OTUs at 97% DNA similarity. Good’s coverage (Good 1953), which provides an estimate of survey completeness, ranged from 91.1 to 98.5% (mean = 95.9%). Cyanobacteria, Proteobacteria, and Chloroflexi were the predominant bacterial phyla within NOR (38.5, 23.4 and 7.9% of 16S rRNA gene sequences, respectively, Fig. 3), whereas Cyanobacteria, Chlorobi, and Chloroflexi were the three most abundant bacterial groups within MRY (82.5, 7.9, and 5% of 16S rRNA gene sequences, respectively, Fig. 3). MKL was comprised mostly of Aquificae (93.5% of the 16S rRNA gene sequences), while SMH was dominated by Armatimonadetes, Chloroflexi, and Bacteroidetes (52.8, 18.6 and 18% of total 16S rRNA gene sequences, respectively, Fig. 3).

Fig. 3
figure 3

Bacterial taxonomic classification and comparison of 16S rRNA gene and metagenomic reads. Taxonomic classification based on 16S rRNA gene amplicon pyrosequencing using the Greengenes database. Taxonomic classification based on metagenome using the MG-RAST M5NR database

Metagenome sequencing, coverage, and overview of microbial community groups

Metagenome sequencing generated 848,583 reads (mean length = 438 bp) totaling 372 Mbp for SMH (Table 2). Assembly of the metagenomic sequence data set yielded 19,346 contigs, with an N50 contig length of 4303 bp. The annotation results of SMH metagenome from MG-RAST and IMG are detailed in Table 3.

Table 2 454 pyrosequencing and newbler assembly metrics of the metagenomic DNA sample from the site SMH (04YSMH020)
Table 3 Features of the thermal spring metagenome SMH (04YSMH020) based on MG-RAST and IMG/M annotations

PCoA analysis based on COG functional categories indicated that the 20 public YNP metagenomes (Table S1), along with SMH, clustered into three distinct groups that could be characterized by their dominant members as: (1) archaeal communities; (2) Aquificales communities; and (3) phototrophic communities (Fig. 4). The grouping detected through the PCoA analysis was also confirmed by the cluster analysis (Fig. S1). The site SMH clustered most closely with the phototrophic community group (Figs. 4, S1).

Fig. 4
figure 4

Principal coordinates analysis (PCoA) of 21 YNP metagenomes based on COG categories

Taxonomic profiles of the metagenomics

Overall, community structure analysis performed with the M5NR database on MG-RAST indicated that SMH was dominated by bacteria (81.04%, Table 4). The remaining sequences from SMH matched Archaea (6.36%), Eukaryota (0.21%), and unassigned sequences (12.38%), or unclassified sequences (0.06%). The taxonomic distribution of numerically abundant phyla derived from the metagenome of SMH indicated that Chloroflexi (~17.8%), Bacteroidetes (~17.7%), Proteobacteria (~13.5%), and Firmicutes (~12.0%) were the four most abundant phyla, according to the GenBank, M5NR, and RefSeq databases (Fig. 5). Ktedonobacteria was the dominant class (~58.0%) in phylum Chloroflexi based on all three databases (Fig. S2). Sphingobacteria was the dominant class in the phylum Bacteroidetes (~52.4%, Fig. S2). Class Deltaproteobacteria was most abundant in phylum Proteobacteria (~36.5%, Fig. S2). At the class level, Clostridia accounted for ~66.8% of all Firmicutes reads, followed by Bacilli (~28.8%, Fig. S2), Negativicutes (~4.2%, Fig. S2), and Erysipelotrichi (~0.3%, Fig. S2). The majority of archaeal sequences were related to the Crenarchaeota and Euryarchaeota (with ~2.87 and ~3.28%, respectively), however, because Archaea comprised only ~6% of the SMH sequences, the remainder of our analysis focused on the Bacteria.

Table 4 Domain distribution on the metagenomic sample based on M5NR database
Fig. 5
figure 5

Comparison of the taxonomic assignment of unassembled SMH metagenomic sequences based on GenBank (NCBI-nr), M5NR, and RefSeq databases

Gene functions enriched in the metagenome of SMH

The metagenome of SMH provided information on the functional capabilities of a microbial community in a relatively low-temperature, pH 4 site. PCoA and cluster analyses showed that the functional profile of the SMH community was most similar to seven previously published YNP metagenomes (Inskeep et al. 2013b) characterized as phototrophic communities (Fig. 4; Fig. S1). However, differences were detected in SMH compared with these seven phototrophic communities. Twenty-seven COG functions were significantly overrepresented in the SMH data set in at least six of the seven comparisons (Table S2). “Energy production and conversion”, “Transcription” and “Carbohydrate metabolism and transport” were the three most abundant COG categories among these functions (25.9, 18.5, and 11.1%, respectively) and were dominant among those with highest enrichment d scores (Table S2; Fig. 6). The 30 most abundant COG functions are listed in Table 5.

Fig. 6
figure 6

Overrepresented COG functions in the metagenome of SMH relative to seven other phototrophic metagenomes. These represent the 20 most enriched functions from a total of 27 COG functions that received significant enrichment scores in six of seven comparisons (Table S2). Average d score represents the mean enrichment score over all seven comparisons. Letters above graphs represent COG category

Table 5 Top 30 (by sequence count) COG functions represented in SMH (04YSMH020) metagenomic assembled sequences based on IMG/M annotations

Energy metabolism mapping

The functional assignment of the unassembled SMH metagenomic data set provided information about possible functions in this community. A total of 14,957 reads were assigned to energy metabolism using BLASTX against the NCBI-nr database, and the majority of reads were related to the domain Bacteria (~92%). Among the bacterial reads, most were mapped to Bacteroidetes, Chloroflexi, Firmicutes, Proteobacteria, and Planctomycetes involved in diverse pathways, such as oxidative phosphorylation, methane metabolism, nitrogen metabolism, carbon fixation pathways in prokaryotes, carbon fixation in photosynthetic organisms, sulfur metabolism, and photosynthesis (Fig. 7). Major KEGG function categories and unique hits assigned to each category are listed in Tables S3 and S4. Genes involved in nitrate reduction were among the abundant categories associated with nitrogen metabolism (Table S3; Fig. S3). Genes encoding sulfate adenylyltransferase, cysteine synthase, and sulfite reductase were also highly enriched in the metagenome of SMH (Table S4; Fig. S4) compared with other similar published metagenomes.

Fig. 7
figure 7

Taxonomic assignment of metagenomic reads from the site SMH related to energy metabolism (KEGG identifiers)

Discussion

Despite the diverse types of springs observed among thermal areas in YNP, their fluids ultimately originate from meteoric water. The parent water is modified through various processes, including water–rock interactions, subsurface mixing, and boiling and cooling during transport to the surface (Fournier 1989, 2005; Truesdell and Fournier 1976; Truesdell et al. 1977). Thermal waters in circumneutral to alkaline springs represent deeply sourced meteoric water enriched in carbonate, chloride, and silica, which emerge from faults located at relatively low elevations. By contrast, sulfate waters in acidic, vapor-dominated springs are generally discovered at the higher elevation unfractured lava caps (Fournier 2005). H2S rich steam separates from the underlying chloride-rich neutral water and enters perched pools of ground water, where H2S is oxidized to H2SO4 either abiotically or biotically (Fournier 1989, 2005; Guo et al. 2014). The presence of impermeable rock caps derived from young lava flows covering old geothermal areas in YNP about 150,000–100,000 years ago effectively segregates circumneutral to alkaline and acidic waters, which may explain the paucity of springs with intermediate pH in YNP (Guo et al. 2014; Hurwitz et al. 2007; Morgan et al. 2005; White 1957).

Although the pH range of all our sites is narrow, profound differences exist in SO4 2−/Cl ratios across sampling sites, reflecting the different extent of mixing between acid-sulfate waters and circumneutral to alkaline, chloride-rich waters (Fig. 2). Acidic geothermal waters (pH < 6) in YNP can be further classified into springs with high SO4 2−/Cl ratios and low SO4 2−/Cl ratios (Fournier 1989; Guo et al. 2014). The MKL and MRY samples represent sites with high SO4 2−/Cl ratios, whereas SMH and NOR have low SO4 2−/Cl ratios due to elevated Cl content. The SMH waters possibly result from the mixing of high SO4 2−/Cl ratio waters with circumneutral to alkaline, chloride meteoric waters (Guo et al. 2014). The low SO4 2−/Cl ratio found in the NOR water sample (Fig. 2) is consistent with the previous observations, indicating that mixing between the two types of waters is common in Norris Geyser Basin (Fournier et al. 2002; Nordstrom et al. 2009). Owing to their unique geochemistry, springs with intermediate pH may harbor distinct microbial communities.

In this study, we employed several approaches to examine the microbial communities in a relatively understudied YNP thermal spring niche, focusing on four pH 4 springs, with temperatures ranging from 55 to 84 °C. Site SMH had the lowest temperature (55 °C), while the three other sites were much warmer (≥72 °C, see Inskeep et al. 2013b). Although we did not detect the Archaea by 16S rRNA gene sequencing in any of the pH 4 sites, archaeal DNA sequences were detected in the SMH metagenome (~6.36% of the total sequences). This discrepancy might be explained by the potential bias of archaeal primers (Colman et al. 2015). Given the dominance of the Bacteria in both data sets, we focused on bacterial communities in this study. Distinct community assemblages were recovered at sites with different temperatures (Fig. 3), despite similar pH values (Table 1). For example, the Aquificae dominated 16S rRNA gene data from MKL (Fig. 5a), one of the high-temperature sites (72 °C). The Aquificae normally predominates in springs with temperatures above 70 °C (Inskeep et al. 2013b; Reysenbach et al. 2005), or high temperature reaches within a spring where photosynthesis is temperature limited (Cole et al. 2013; Everroad et al. 2012; Hall et al. 2008; Huber and Stetter 2001). In contrast, the lower temperature in situ, together with the mixing of acid-sulfate waters with circumneutral to alkaline, chloride-rich waters in SMH (Fig. 2), may contribute to the broader taxonomic composition observed in this spring (e.g., Armatimonadetes, Bacteroidetes, and Chloroflexi, Fig. 3). Although Armatimonadetes sequences dominated the 16S rRNA gene sequencing reads, they were not detected in the metagenomic annotation (Figs. 3, 5). We did not detect a similar discrepancy between the amplicon and metagenomic sequencing among any of the other bacterial phyla. The Armatimonadetes is a newly described phylum and is estimated to comprise 12 groups, occurring in a variety of environments (Dunfield et al. 2012), but only a few strains of Armatimonadetes have been isolated to date (Dunfield et al. 2012). The lack of isolates and classification may be attributed to inaccurate annotations, which results in incongruent results. We detected 16S rRNA gene amplicons from Cyanobacteria in NOR and MRY (Fig. 3), where temperatures were 84 and 80 °C, respectively (Table 1), which is above the temperature limit for photosynthesis (~70–75 °C) (Hamilton et al. 2012; Klatt et al. 2011; Rothschild and Mancinelli 2001). Samples from sites NOR and MRY were collected from the upper 3 cm of sediment, where dead cells from allochthonous phototrophic microorganisms inhabiting cooler margins of the stream channel may deposit and accumulate.

Although the four springs had similar pH values, it is likely that the distinct geochemistry of each spring additionally contributed to the community differences observed. In particular, biologically important species, such as SO4 2−, NH4 +, and NO3 , varied among the springs and concentrations were high compared with other springs sampled parkwide. The concentration of SO4 2− in MKL, for example, was about seven orders of magnitude higher than that in NOR (Table 1; Fig. 2). Of note, in contrast to the paucity of NO3 in NOR and MKL, the NO3 concentration (37.0 mg L−1) in SMH was the highest among 104 inventoried sites representative of the diversity of sites parkwide (http://vlab.lternet.edu/ynp_inv_data_products.html).

We compared the metagenome of the low-temperature (55 °C) site SMH, to 20 YNP metagenomes from a previous report (Inskeep et al. 2013b), to better understand how the microbial community of this uninvestigated habitat compared with other relatively well-characterized communities from YNP. We expected site SMH to group with the archaeal sites, because they were all from low-pH sites (pH 2–4). Instead, site SMH, which was collected at a temperature 17 °C cooler than the coldest archaeal site (One Hundred Spring Plain, 72 °C, Table S1), clustered with phototrophic sites (Fig. 4; Fig. S1), likely due to the predominance of phototrophic organisms and the low abundance of Archaea (~6%) in SMH.

The metagenome of SMH provided a first glimpse of the metabolic functional profile of a low-temperature, pH 4 site of YNP. Functions overrepresented in the SMH metagenome compared with other YNP samples were responsible for energy production and represented important redox reactions and key steps in electron transport. Some enriched COG functions (Table S2; Fig. 6), such as coenzyme F420 (COG 2141), heme Cu oxidase (COG1622), and carbon monoxide dehydrogenase (COG1529, COG2080, and COG1319) and proteins (COG3794 and COG0723), involved in electron transport can be related directly to the microbial community and their metabolic potentials. For example, COG 2141 associated with coenzyme F420-dependent 5,10-methylenetetrahydromethanopterin reductase belongs to the family of oxidoreductases responsible for redox reactions in many Actinobacteria and methanogenic Archaea (Deppenmeier 2002). Heme-copper-type oxidases (COG 1622) representing the terminal energy-transfer enzymes of respiratory chains play a significant role in aerobic metabolism (Garcia-Horsman et al. 1994). COGs related to carbon monoxide dehydrogenase can be involved in diverse biochemical pathways, including aerobic carboxydotrophic, sulfate-reducing, and hydrogenogenic bacteria.

Among the overrepresented functions, we observed many relevant to carbohydrate metabolism and transport. For instance, COG2814 belongs to the family of “arabinose efflux permease”. Proteins of this COG function belong to the major facilitator superfamily (MFS) that can transport a diverse array of substrates, including amino acids, drugs, ions, and sugars across the membrane (Law et al. 2008). Another overrepresented function COG2271, a sugar phosphate permease, which is responsible for carbohydrate transport derived from the environment, is also affiliated with the MFS. Functions (e.g., COG 1131 and COG 0841, Table 5) associated with multidrug resistance are known to export antibiotics and toxic molecules (Piddock 2006). Bacteria bearing these functions can defend against toxic compounds produced by competitors (Piddock 2006). Functions (COG1629 and COG1914) related to inorganic ion transport and metabolism are significantly abundant in SMH. Microorganisms in thermal springs are expected to encounter heavy metals (Inskeep et al. 2010) and possess genes involved in heavy metal transport.

Pathways involved in nitrogen and sulfur metabolism frequently contribute significantly to energy generation by thermal spring microbial communities, because alternative electronic acceptors, such as arsenate, CO2, elemental sulfur, ferric iron, nitrate, sulfate, and thiosulfate, are often abundant at spring sources (Hall et al. 2008; Inskeep et al. 2010; Jimenez et al. 2012). Approximately 81.1% of the genes detected in SMH associated with nitrogen metabolism were related to the Bacteroidetes, Chloroflexi, Firmicutes, Proteobacteria, and Planctomycetes (Fig. 7). Nitrate as an electron acceptor is energetically favorable over broad pH ranges, which may explain the widespread distribution of nitrate reduction genes in different types of YNP springs (Shock et al. 2010; Swingley et al. 2012). In addition, the nitrate concentration in SMH was the highest among all our YNP inventory sites (37.0 mg L−1, http://vlab.lternet.edu/ynp_inv_data_products.html), thus representing an abundant nutrient and energy resource for nitrate reducers.

Genes necessary for dissimilatory nitrate reduction to N2 via denitrification, including dissimilatory nitrate reductase gene clusters (e.g., narG, narH, narI, and narJ) and nitric oxide reductase gene clusters (e.g., norB, norC, norD, and norQ) were prevalent in SMH (Table S3; Fig. S3). In addition, the gene coding for nitrite reductase (nirK) was detected (Table S3; Fig. S3). Nitrite reductase (nirK or nirS) is a pivotal enzyme of dissimilatory nitrate reduction pathway. According to models of dissimilatory nitrate reduction in bacteria (Gonzalez et al. 2006; Moreno-Vivián et al. 1999; Richardson et al. 2001), a nitrite reductase (nirK or nirS) is requisite for producing NO, which is a substrate for nitric oxide reductase (e.g., norB, norC) to produce N2O. The sequences related to gene nosZ coding for nitrous oxide reductase (associated with class Aquificae, Bacteroidetes, Flavobacteriia, Ignavibacteriae, Sphingobacteriia, and Thermomicrobia) are important for the last step of denitrification, which converts N2O to N2 (Table S3; Fig. S3).

Given the abundance of ammonium (16.9 mg L−1) in SMH and the high-energy costs of biological nitrogen fixation, the absence of nifK (Table S3; Fig. S3), a gene involved in the synthesis of molybdenum-dependent nitrogenase (Dos Santos et al. 2012), is not surprising. The source of ammonium in SMH is most likely abiotic (Holloway et al. 2011), because genes associated with the dissimilatory nitrate reduction to ammonium (DNRA) pathway (e.g., napA, nrfA) were completely absent from SMH and have not been reported for YNP. The absence of DNRA in SMH may be due to factors, such as the lack of dissolved organic carbon (below detection) and high nitrate availability (37.0 mg L−1; Table 1) in situ. The previous studies suggest that DNRA activities can outcompete denitrification activities under high C/NO3 ratios (Tiedje et al. 1983) and low nitrate availability (van den Berg et al. 2015). Ammonium assimilation is possible in SMH by genes coding for assimilatory nitrate reduction to ammonia (e.g., nasA, nirA, and nirB). In this bacteria dominated spring, we did not detect genes (e.g., amoA) coding for the ammonium monooxygenase, which is able to use ammonia as a substrate. Our result is consistent with the previous reports of the absence of nitrification genes from other YNP thermal springs (Inskeep et al. 2010; Swingley et al. 2012). Nitrification may not predominate among YNP thermal springs, because the oxidation of ammonium is rarely thermodynamically favorable under in situ conditions (Shock et al. 2010), despite the occurrence of the archaeal amoA-like genes in the YNP axenic culture and environmental samples (de la Torre et al. 2008; Zhang et al. 2008).

Most of the genes involved in sulfur metabolism are related to the conversion of sulfate into adenylylsulfate and to the subsequent production of sulfite and H2S (Table S4; Fig. S4), similar to what has been previously reported in other thermal springs (Jimenez et al. 2012; Swingley et al. 2012). Given that sulfide was below detection in SMH, whereas sulfate was 306 mg L−1 (Table 1), sulfide must be oxidized or incorporated rapidly. The abundance of available sulfate provides a large energy source for sulfate-reducing microbes, which is further supported by the pathway reconstruction of sulfate reduction based on the metagenomic gene content (Fig. S4).

Genes responsible for cysteine synthase A (cysK) and B (cysM) are implicated in the formation of adenylylsulfate (Table S4; Fig. S4). The environmental aprA and aprB sequences coding for adenosine-5′-phosphosulfate (APS) reductase (Apr) exhibit closest matches to members of Betaproteobacteria and Thermoprotei (Table S4; Fig. S4). Based on the current models of dissimilatory sulfate reduction and sulfur oxidation in prokaryotes, adenosine-5′-phosphosulfate (APS) reductase (Apr) is a pivotal enzyme. During the process of sulfate reduction, the function of Apr is to convert APS to sulfite. Once sulfate is activated to APS by ATP-sulfurylase at the expense of ATP, sulfite is subsequently reduced to sulfide by dissimilatory sulfite reductases (DSRs, Meyer and Kuever 2008). The alpha subunit of Apr enzymes is considered to be ubiquitous in all known sulfate-reducing and most of the sulfur-oxidizing prokaryotes (Meyer and Kuever 2008). For example, environmental aprA reads found in site SMH have high identity to those annotated in the Thiobacillus plumbophilus and Caldivirga maquilingensis genomes (e value < 10−35 Table S4; Fig. S4). T. plumbophilus requires H2S as an electron donor (Drobner et al. 1992), whereas C. maquilingensis respires sulfur, thiosulfate, or sulfate (Itoh et al. 1999). Genes coding for the assimilatory and dissimilatory reduction of adenylylsulfate to sulfite (e.g., aprA, aprB, and cysH) and the subsequent assimilatory reduction of sulfite to H2S (e.g., cysI and sir) observed in site SMH suggest that sulfate and sulfite reduction pathways are dominant processes in the environment studied here. Unlike previously reported for a YNP alkaline spring (Swingley et al. 2012) where there is a genomic potential for sulfur oxidation, enzymes, such as sulfite oxidase (sox), were not detected in our data set, either due to insufficient sequencing depth, low abundance, or absence.

Conclusions

In this study, the microbial community of pH 4 springs was studied using 16S rRNA gene pyrosequencing. We found that the microbial assemblages varied among the four different sites studied, despite the narrow range of pH values sampled. Temperature and geochemistry of the waters likely contributed to the differences we observed. In addition, we assessed the functional profiles of the microbial community in a low-temperature, pH 4 spring that was previously unexplored, using shotgun metagenome sequencing. Functional cluster analyses revealed that this unexplored geobiological community was most similar to other phototrophic communities sampled from YNP, although the pH was more consistent with sites dominated by Archaea. Taxonomically, this spring community included a microbial assemblage and functional profile that were distinct from other phototrophic YNP communities, which are typically circumneutral. Apart from Chloroflexi that are commonly found in phototrophic communities, Bacteroidetes, Proteobacteria, and Firmicutes were abundant in this spring. The taxonomic diversity resulted in metabolic diversity (e.g., chemotrophs and heterotrophs), as described in the metagenomic data presented here. Compared with other YNP phototrophic communities, the annotation based on COG database indicated a relative enrichment of functions involved in energy production and conversion, transcription, and carbohydrate transport. The identification of genes coding for nitrogen and sulfur cycling revealed a microbial population involved in the dissimilatory and assimilatory reduction of nitrate, and conversion of sulfate into adenylylsulfate, sulfite, and H2S. Low-temperature, intermediate-pH terrestrial hydrothermal springs in YNP harbor unique communities with diverse metabolisms that deserve further attention. This research not only provides an initial survey that serves as a foundation for understanding microbial communities in these less common springs, but also offers a framework for future microbial studies in pH 4 YNP thermal springs. It is notable that the waters from the four pH 4 sites in this study underwent different water mixing processes. Efforts to better integrate the role of water source and history may prove useful in understanding the microbial ecology of thermal springs.