Introduction

With 1.5 million extant species representing the lower estimate for total fungal diversity, much of the fungal kingdom remains to be characterized [1, 2]. Recent and ongoing coordinated efforts have delimited major phylogenetic lineages, yet the ecology of most fungi remains poorly understood [36]. Soil is often cited as the environment harboring the greatest fungal diversity, and the analysis of grassland soils has revealed extraordinarily diverse communities with taxa spanning the Kingdom Fungi [1, 7, 8]. Despite this diversity and the significance of fungi to terrestrial ecosystem processes [9], factors influencing the diversity, presence, and abundance of most soil-dwelling fungi are largely unknown. This can be attributed to the fastidious nature of fungi as well as the difficulty of a simple census of all fungi present in a soil sample.

The limitations to studying fungi in the environment by culturing have been partially overcome by sequencing amplicon libraries derived from environmental nucleic acids (i.e., targeted metagenomics). Metagenomic analyses have been used to uncover patterns in global fungal diversity [10] and describe temporal dynamics in plant-associated fungal communities [11, 12]. Use of targeted metagenomics also has revealed large groups of novel fungi [13, 14] that were subsequently verified through culture-based studies and phylogenetics [15]. Despite these advances, there are major computational and biological limitations to sequence-based approaches to be overcome. With no morphological characters to observe, taxonomic assignment of sequence data is entirely dependent on the quality of available reference sequences. The poor quality of annotation [16, 17] and sometimes incorrect annotation of publicly available sequence data [18] are often cited as major caveats to metagenomic studies.

This study had two primary objectives. The first objective was to characterize soil fungal communities associated with grassland plant rhizospheres by sequencing a multiple-copy rRNA locus (LSU). The second objective was to test the feasibility of using a single-copy protein-coding locus (RPB2) to target the genus Fusarium associated with the rhizosphere of native grassland plants. Fusarium species are well studied as plant pathogens in agricultural ecosystems [19, 20], but these fungi also play an important, little studied functional role in ecosystem processes including decomposition and carbon sequestration [21, 22].

To address issues of annotation in publicly available databases, high-quality references for the LSU and RPB2 loci were used to assign taxonomic classification to metagenomic sequences. The LSU locus has been used in targeted metagenomic studies [10] and due to its high copy number is easily amplified from environmental nucleic acids. This latter point also becomes a problem in linking sequence abundance to fungus abundance due to copy number variation even in closely related fungi [23]. Additionally, this locus, as well as intergenic loci such as ITS, exhibits relatively low nucleotide diversity within some fungal taxa (e.g., the genus Fusarium) compared to other protein-coding genes [24]. This motivated our efforts to target a portion of the protein-coding gene RPB2 to study Fusarium populations in situ.

To extend our understanding of the ecology of fungi associated with native grassland plants, we applied the tools described above to evaluate biotic and abiotic factors correlated with fungal richness, composition, and community structure. To test the effect of plant hosts and plant community richness on soil fungi, we studied rhizosphere communities associated with two perennial legumes (Lespedeza capitata and Lupinus perennis) and two perennial C4 grasses (Andropogon gerardii and Schizachyrium scoparium) in a model grassland experiment in the upper-midwest of the USA. We tested if plant-specific effects vary when the plant is grown in monoculture versus polyculture. We also studied the relationships of soil nutrients to the fungal community richness and structure. In environments where plant vegetation is scarce (e.g., Antarctica), the abundance of fungi has been linked to soil edaphics [25], but the response of fungal community richness and structure to soil nutrients in diverse habitats remain poorly understood. This work highlights the novel use of a single-copy locus in combination with a high-copy rRNA locus to characterize shifts in fungal community structure and richness in response to plant host, plant community richness, and soil edaphic characteristics.

Methods

Field Sampling

The Cedar Creek Ecosystem Science Center, an NSF Long-Term Ecological Research site, is located in East-Central Minnesota. Soil cores were collected in 2009 from experiment E120 (http://www.cedarcreek.umn.edu/research/experiments/e120.php), which was established in 1994 [26]. Four different plant species were sampled growing in two levels of plant community richness, monoculture (stands of a single plant species) and polyculture (stands of 16 randomly selected native plant species from a pool of 18 [26]). Soil was collected from the rhizosphere of two perennial C4 grasses, A. gerardii and S. scoparium, and two perennial legumes L. capitata and L. perennis. Each sample consisted of four bulked soil cores (5 cm diameter, 30 cm depth).

For the LSU locus, 17 samples were analyzed. For the RPB2 locus, only nine monoculture samples were analyzed with three plant replicates each from S. scoparium, L. capitata, and L. perennis.

Prior to processing, soil was passed through a 2-mm mesh to remove plant debris. The PowerSoil® DNA Isolation Kit (MO BIO) was used to extract DNA by adding an initial sonication step and increased bead-beating to the standard extraction protocol [27]. Soil organic matter (OM), extractable nitrogen (N), extractable potassium (K), and pH were measured using standard methods at the Research Analytical Laboratory at the University of Minnesota (http://soiltest.cfans.umn.edu/).

Molecular Biology

Amplicon libraries were created from environmental DNA using semi-nested PCR. All reactions were performed in an Eppendorf Mastercycler. The initial PCR was used to create an amplicon library. The second PCR was used to tag the amplicons with 454 GS FLX + specific adaptors and sample-specific barcodes, except in the case of the RPB2 locus where the second PCR also reduced the size of the amplicons. For the LSU locus, the primers LROr/LR3 (http://www.biology.duke.edu/fungi/mycolab/primers.htm) were used in both rounds of PCR. To target the RPB2 locus, a previously developed primer RPB2-5F2 [28] was used as a forward primer. Due to the size of the amplicons generated with the typical reverse primer RPB2-7cR, a new reverse primer, nested inside the 5F2/7cR amplicon, was developed for compatibility with 454-sequencing technology. An alignment of the RPB2 locus from 73 different Fusarium spp. was used to predict a suitable reverse multiplex primer RPB2-RP1 (5′-GGNGTCATGCARATCATNGC-3′). This new primer was screened against 45 isolates of both model and uncharacterized Fusarium strains, showing positive amplification (data not shown).

For the LSU locus, the first round of PCR used 20 μL reactions with 2 μL Ex-Taq buffer, 1.6 μL dNTPs (2.5 mM of each), 2 μL of each forward and reverse primers (10 pmol/μL), 11.3 μL water, 0.1 uL Ex-Taq polymerase (5 units μL−1), and 1 μL template (10 ng DNA μL−1). The reactions were performed with an initial denaturing step at 95 °C (5 min.) followed by 24 cycles of 95 °C (1 min), 57 °C (1 min), 72 °C (1 min), and ended with an extension of 72 °C (10 min). Three reactions were performed per sample and cleaned using Qiagen PCR cleanup kit. The second round of PCR was performed utilizing the same reaction ingredients except 0.4 μL of forward and reverse primers, 7.5 μL water, and 8 μL template (1 ng μL−1). The reactions were performed similarly but with an annealing temperature of 69 °C and only 12 cycles (to reduce non-specific priming). Four reactions per sample were pooled and gel extracted using QIAquick Gel Extraction Kit (Qiagen).

Amplification of the RPB2 locus followed similar methods with the following changes. The annealing temperature for the initial round of PCR was 60 °C, and 25 cycles were used for four replicate reactions. For the second round of PCR, the annealing temperature was 65 °C and 20 cycles were performed for three replicate reactions. DNA was quantified using a Hoefer fluorimeter and pooled in equal sample concentrations. Libraries were sequenced using the Roche 454 GS FLX + sequencer at the University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center. Demultiplexed raw sequence data can be found at the National Center for Biotechnology Information (NCBI) Sequence Read Archive accession number SRS600205.

Operational Taxonomic Unit Construction of Pyrosequence Data

Generation of OTU data for the LSU locus was performed using mothur v.1.27.0 [29] and following steps outlined in Schloss [30] to minimize PCR and sequencing artifacts. Flowgrams were denoised using the mothur implementation of the PyroNoise [31] algorithm. Denoised sequences were screened for the following criteria: presence of sample bar code sequence, presence of primer sequence, and a minimum length of 250 nucleotides (following removal of bar code and primer). Sequences failing to meet these criteria were culled, and the sequences meeting these criteria were truncated to keep the first (5′) 250 nucleotides. A set of unique sequences was generated, and from it, a random group (n = 1000) was selected and used to create a template alignment using the program Muscle [32]. The template alignment was then used to align the unique sequence set in mothur. The preclustering algorithm was used to further reduce the effect of sequence and PCR error and sequences were then clustered into OTUs using the average neighbor algorithm. Due to differences in sequence depth per sample, a random subsampling was performed based on the sample with the fewest sequences (n = 400). These data were used to calculate fungal OTU richness and for the analysis of OTU-based fungal community structure.

Alignment-Based Taxonomic Assignment of Pyrosequence Data

A separate sequence processing pipeline was developed to assess overall reproducibility across different methods and to assign a taxonomic name to sequence data from both loci. Raw sequences were initially processed using mothur v.1.27.0 with the following criteria: a minimum length of 200 nucleotides, presence of sample bar code, and an average quality score of 20. A set of 4030 LSU and 3319 RPB2 reference sequences were compiled from data generated in the Assembling the Fungal Tree of Life project (http://aftol.org/), review of phylogenetic research, and mining-sequenced fungal genomes available through the Broad Institute (http://www.broadinstitute.org/scientific-community/science/projects/fungal-genome-initiative/fungal-genome-initiative) and Joint Genome Institute (http://genome.jgi.doe.gov/programs/fungi/index.jsf). Taxonomic assignment to reference sequences was based on the NCBI Taxonomy derived using BioPerl modules [33]. Higher taxonomic classifications were derived from NCBI using the R package CHNOSZ [34]. The BLAST + programs [35] were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nih.gov/blast/executables/blast+/2.2.25/) and used to align sequences generated by pyrosequencing to the LSU and RPB2 reference sequences. A cutoff of ≥400 bit score was used to guard against alignment of non-fungal sequences from the LSU locus to basal lineages within the Kingdom Fungi. Due to differences in the number of sequences producing an alignment and genus assignment per sample, data was standardized using two different methods. Subsampling was performed based on the sample with the fewest sequences (n = 337) and was used to calculate fungal genus richness, whereas the relative abundance of sequences per genus was used in the analysis of genus-based fungal community structure.

Statistics

All statistics were performed in R utilizing the base packages [36] plus the ggplot2 [37] and vegan [38] packages.

Two-sided t tests were used to test for a significant difference between sample means. Plotted data occasionally showed heterogeneity between sample variances (e.g., OM between monoculture and polyculture soils): however, utilization of Welch’s t test, which corrects for heterogeneity of variances, or the non-parametric Wilcox test (also known as the Mann-Whitney test), gave consistent significant results.

Using the LSU locus data, fungal community richness was calculated separately as both the number of non-singleton OTUs per sample and the number of different genera discovered through alignment per sample. Differences in sequence depth per sample were accounted for as described above for OTU and genus data.

To explore differences between samples, non-metric multidimensional scaling (NMDS) was performed with OTU presence/absence data (Jaccard distance) as well as genus relative abundance data (Bray-Curtis distance). Vectors of soil edaphics were fit to ordination data and assessed for a significant correlation in ordination space using the envfit function in the vegan package.

Permutational multivariate analysis of variance using distance matrices (PERMANOVA) [39] was used to test for significant differences between fungal community structure in monoculture and polyculture soils as well as between grasses and legumes in monoculture or polyculture. Distance matrices were calculated from presence/absence OTU data (Jaccard distance) as well as genus relative abundance data (Bray-Curtis distance), and the test was performed using the adonis function in the vegan package. Initially all the LSU samples were used to test for differences in fungal community structure between monoculture and polyculture and grasses and legumes. Then distance matrices were generated separately from monoculture and polyculture samples and used to test for differences between grasses and legumes in monoculture or polyculture plant communities.

Results

Sequence output varied between pooled samples, loci, and taxon delimitation methods (alignment-based genus assignment versus OTU classification). For OTU construction using the LSU locus, the total number of sequences after processing (see “Operational Taxonomic Unit Construction of Pyrosequence Data”) was 90,338. Per sample, the average number of sequences was 5314 (standard deviation = 1519); the smallest sample had 400 sequences and the largest had 34,406 sequences. Sequences from both loci were characterized using alignment-based taxonomic assignment. For the LSU locus, there was a total of 106,844 sequences that fulfilled our criteria and passed our alignment threshold; 69 % of the sequences that passed initial sequence aligned to reference sequence (see “Operational Taxonomic Unit Construction of Pyrosequence Data”). Per sample, the average number of sequences was 6285 (standard deviation = 9466); the smallest sample had 337 sequences and the largest had 34,268 sequences. For alignment-based taxonomic assignment for RPB2 sequences, a total of 23,122 sequences fulfilled our criteria and passed our alignment threshold; 87 % of the sequences that passed initial sequence aligned to reference sequence. Per sample, the average number of sequences was 2569 (standard deviation = 1519); the smallest sample had 322 sequences and the largest had 4586 sequences.

The reference databases compiled for taxonomic assignment of unknown sequence data covered two loci. The LSU locus database had broad taxonomic coverage within Fungi, with 4030 sequences representing 1446 genera within 129 orders. The database for the RPB2 locus, known to have greater divergence among closely related fungi than the LSU locus [24], had 3318 sequences representing 805 genera within 103 orders. The use of the RPB2 locus in this research was aimed at testing the suitability for targeting taxa within the genus Fusarium, and approximately 13 % of the sequences aligned to Fusarium references. Since the amplicon libraries for the RPB2 locus were generated from the same DNA as the larger set of amplicon libraries for the LSU locus, it was possible to compare more generally how the taxonomic distribution differed between the two loci. As summarized in Table 1, it is clear that the LSU locus covered a much greater taxonomic breadth of fungi than the RPB2 locus, whereas the RPB2 locus enriched for taxa in the Sordariomycetes. It should be noted that although the underlying reference databases differ between the loci, all the classes represented in the table are in each reference database.

Table 1 Taxonomic classification for pyrosequencing data from shared sequenced samples from the LSU and RPB2 loci

Soil organic matter, nitrogen, and potassium were significantly higher in polyculture than in monoculture plant communities (Table 2) and in general were significantly positively correlated (data not shown). Within the monoculture samples, soil nitrogen was greater in the legume rhizosphere compared to grasses (t 7 = −2.505, P < 0.05). Within the polyculture samples, nitrogen did not differ significantly between the rhizosphere of legumes and grasses.

Table 2 Soil edaphics in monoculture versus polyculture plant communities

Soils from polyculture plant communities had richer fungal communities than those from monocultures based on LSU OTU (t 15 = −3.8295, P < 0.01) and genus (t 15 = −3.321, P < 0.01) data (Fig. 1). Fungal richness did not differ between plant hosts when considering all samples or only monoculture samples. Operational taxonomic unit richness was positively correlated with potassium (R 2 = 0.30, P < 0.05), as was genus richness (R 2 = 0.27, P < 0.05). No other soil edaphics were significantly correlated with either measures of fungal richness.

Fig. 1
figure 1

Box plots of a OTU and b genus richness in monoculture and polyculture samples. Differences between monoculture and polyculture are significant (P < 0.01) in both cases

Fungal communities associated with plants growing in monoculture were distinct from communities associated with the same plant hosts growing in polyculture (Fig. 2). Fungal communities associated with different plant families (grasses versus legumes) grown in monoculture were distinct; however, fungal communities associated with grasses and legumes were not distinct when plants were grown in polyculture (Fig. 2). Soil organic matter, potassium, and nitrogen were significantly correlated within ordination space. All three of the edaphics show the greatest correlation with the axis separating fungal communities in plant monocultures and polycultures (Fig. 2). The ordination plots for genus relative abundance and OTU presence/absence data showed similar patterns in ordination space (Fig. 2a, b), which demonstrates that differences in community structure were robust across different taxon delimitation/summary methods.

Fig. 2
figure 2

NMDS plots for the a OTU presence/absence and b genus relative abundance data. Vectors show edaphics that are significantly correlated within ordination space (N nitrogen, OM organic matter, K potassium)

The clustering seen in ordination was used as a basis to test for differences between fungal communities associated with monoculture and polycultures as well as grasses and legumes growing in monoculture or polyculture. Based on OTU presence/absence data, fungal communities were significantly different between monoculture and polyculture plant communities (adonis, P < 0.001) as well as grasses and legumes (adonis, P < 0.05). Separate analysis of the effect of plant identity within each plant diversity treatment showed a significant difference between fungal communities associated with grasses and legumes in monoculture (adonis, P < 0.01) but not in polyculture (adonis, P > 0.8). The same statistically significant results were obtained using genus relative abundance data.

Fungi in the genus Fusarium were of particular interest, and sequences aligning to references from this genus were found in every sample for both sequenced loci. Interest in this genus also motivated the sequencing of the RPB2 locus. For the RPB2 locus, within Fusarium, about 53 % of the sequences aligned to references in the Fusarium tricinctum clade followed by 45 % which aligned to references in the Fusarium oxysporum clade (Fig. 3).

Fig. 3
figure 3

Relative abundance of sequences (RPB2 locus) aligning to Fusarium clades from three replicates of each plant species grown in monoculture

Discussion

In this work, grassland rhizosphere fungal richness and community structure varied with plant community richness, plant identity, and soil edaphics, particularly potassium. Despite the inability to clearly separate plant and soil edaphic factors, these results provide considerable insight into the variation in fungal communities within these grassland soils. Fungal communities in the rhizosphere of plants growing in polyculture were richer and structurally distinct from fungal communities in the rhizosphere of the same host plant growing in monoculture, and fungal communities associated with legumes and grasses in monoculture (but not polyculture) were structurally distinct.

The link between plant richness and soil fungal richness in this work is consistent with a recent study in rainforest soils in which plant diversity showed a positive correlation with fungal richness [40]. However, other studies have found no relationship between plant and soil fungal richness in both grassland [41] and rainforest [42] soils, suggesting that plant richness may not always be the best predictor of fungal richness. A parallel metagenomic analysis, using the same DNAs as this study, did not find a tight coupling between plant richness and streptomycete richness [43]. This suggests that richness within different lineages of soil microorganisms within the same community can be influenced by distinct environmental factors [44].

Increasing plant community richness has been shown to influence primary and secondary metabolism of individual plant species [45], and overall plant community productivity [26], with likely consequences on the quantity and types of nutrients entering the rhizosphere. In agreement with a link between plant richness and soil edaphics, we found that soils of polyculture plant communities were more nutrient rich. Additionally, soil edaphic data was strongly correlated with the ordination axis separating fungal communities found in monoculture and polyculture soils, suggesting that greater nutrient richness in polyculture is significant in determining fungal community structure. It is impossible to then clearly differentiate plant community richness versus soil edaphic effects on soil fungal community structure.

Fungi in the genus Fusarium are well studied as pathogens of many plants. They are also commonly isolated as endophytes and soil saprotrophs, yet the biology of these fungi in non-agricultural systems is poorly understood. The presence of sequences aligning to Fusarium references in all of the sequenced samples highlights the ubiquity of these fungi in these grassland soils. Comparison of Fusarium clades in these North American grassland soils with comprehensive surveys of Fusarium in the soils of the islands of Sardinia [46] and New Zealand [47] shows F. oxysporum present in all soils. Of the other abundant clades found here, F. tricinctum was not recovered from Sardinian soil and Fusarium solani was not recovered from New Zealand soil [47] suggesting a more ubiquitous distribution of F. oxysporum.

There are multiple caveats to this work that are inherent to any sequence-based study. Two ongoing issues are the question of correlation of sequence abundance with a biologically relevant measurement (e.g., biomass) [48] and the effect of sequencing error on estimates of diversity [49]. To address these issues, we have used multiple methods. To address the issue of sequence abundance, we used relative abundance as well as presence/absence data with respect to fungal community structure. Relative abundance data do not represent absolute values of sequence numbers, and in general, results using the relative abundance and presence/absence data are in agreement. For example, the ordination of genus relative abundance and OTU presence/absence data showed similar patterns. To address inflated levels of diversity, we utilized a reference-based taxonomic delimitation (i.e., alignment) and we used denoised flowgram data prior to sequence processing and OTU construction. Reference-based taxon delimitation should help reduce the effect of sequencing error on the dataset since this approach is based on alignment of metagenomic data to a high-quality reference, not alignment to other metagenomic sequences. Denoising pyrosequencing data has been shown to give more accurate estimates of OTU number (richness) than non-denoised data [33]. Overall, the consistent patterns resulting from these distinct analyses suggest that our findings are robust.

We have also developed methods to generate and sequence amplicons from a locus that is likely single copy in most fungal lineages. In theory, abundance data from a single-copy locus should be more representative of underlying taxon abundance than those from a locus that exhibits inter- and intraspecific variation such as rRNA genes [23]. More importantly, this locus has proven valuable in studying evolutionary relationships in the genus Fusarium [50] and will aid in further efforts to link the evolution and ecology of these fungi.

Through development of custom databases and targeting the LSU and RPB2 loci, we have characterized shifts of grassland fungal soil communities associated with different grassland plant communities and correlated soil edaphics. Fungal community richness was greater in polyculture plant communities, and fungal community composition was differentiated between monoculture and polyculture communities and plant host within monoculture plant communities. Although unable to separate the effects of plant richness from soil nutrient effects, this work shows that grassland rhizosphere fungal communities are tightly linked to plant community dynamics and as such need to be more explicitly integrated into current understanding of these grassland systems.