Multiple ITS Haplotypes in the Genome of the Lichenized Basidiomycete Cora inversa (Hygrophoraceae): Fact or Artifact?

Lücking, Robert; Lawrey, James D.; Gillevet, Patrick M.; Sikaroodi, Masoumeh; Dal-Forno, Manuela; Berger, Simon A.

doi:10.1007/s00239-013-9603-y

Multiple ITS Haplotypes in the Genome of the Lichenized Basidiomycete Cora inversa (Hygrophoraceae): Fact or Artifact?

Original Article
Published: 17 December 2013

Volume 78, pages 148–162, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Molecular Evolution Aims and scope Submit manuscript

Multiple ITS Haplotypes in the Genome of the Lichenized Basidiomycete Cora inversa (Hygrophoraceae): Fact or Artifact?

Download PDF

Robert Lücking¹,
James D. Lawrey²,
Patrick M. Gillevet²,
Masoumeh Sikaroodi²,
Manuela Dal-Forno² &
…
Simon A. Berger³

648 Accesses
30 Citations
Explore all metrics

Abstract

The internal transcribed spacer region (ITS) of the nuclear rDNA cistron represents the barcoding locus for Fungi. Intragenomic variation of this multicopy gene can interfere with accurate phylogenetic reconstruction of biological entities. We investigated the amount and nature of this variation for the lichenized fungus Cora inversa in the Hygrophoraceae (Basidiomycota: Agaricales), analyzing base call and length variation in ITS1 454 pyrosequencing data of three samples of the target mycobiont, for a total of 16,665 reads obtained from three separate repeats of the same samples under different conditions. Using multiple fixed alignment methods (PaPaRa) and maximum likelihood phylogenetic analysis (RAxML), we assessed phylogenetic relationships of the obtained reads, together with Sanger ITS sequences from the same samples. Phylogenetic analysis showed that all ITS1 reads belonged to a single species, C. inversa. Pyrosequencing data showed 266 insertion sites in addition to the 325 sites expected from Sanger sequences, for a total of 15,654 insertions (0.94 insertions per read). An additional 3,279 substitutions relative to the Sanger sequences were detected in the dataset, out of 5,461,125 bases to be called. Up to 99.3 % of the observed indels in the dataset could be interpreted as 454 pyrosequencing errors, approximately 65 % corresponding to incorrectly recovered homopolymer segments, and 35 % to carry-forward-incomplete-extension errors. Comparison of automated clustering and alignment-based phylogenetic analysis demonstrated that clustering of these reads produced a 35-fold overestimation of biological diversity in the dataset at the 95 % similarity threshold level, whereas phylogenetic analysis using a maximum likelihood approach accurately recovered a single biological entity. We conclude that variation detected in 454 pyrosequencing data must be interpreted with great care and that a combination of a sufficiently large number of reads per taxon, a set of Sanger references for the same taxon, and at least two runs under different emulsion PCR and sequencing conditions, are necessary to reliably separate biological variation from 454 sequencing errors. Our study shows that clustering methods are highly sensitive to artifactual sequence variation and inadequate to properly recover biological diversity in a dataset, if sequencing errors are substantial and not removed prior to clustering analysis.

Unraveling intragenomic polymorphisms in the high-quality genome of Hypoxylaceae: a comprehensive study of the rDNA cistron

Article Open access 29 December 2023

Intragenomic polymorphisms in the ITS region of high-quality genomes of the Hypoxylaceae (Xylariales, Ascomycota)

Article Open access 08 February 2020

Using target enrichment sequencing to study the higher-level phylogeny of the largest lichen-forming fungi family: Parmeliaceae (Ascomycota)

Article Open access 14 December 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The internal transcribed spacer (ITS) of the nuclear ribosomal DNA cistron is the barcoding locus for Fungi (Schoch et al. 2012), because of its versatility in resolving the species-level clades and the large amount of ITS sequence data already available (Pinto et al. 2004; Rossman 2007; Seifert 2008; Taylor and McCormick 2008; Eberhardt 2010; Begerow et al. 2010; Del-Prado et al. 2010; Hibbett et al. 2011; Schoch et al. 2012). Nuclear ribosomal DNA occurs in multiple copies in the genome and is easy to amplify from small quantities of template material. This is also disadvantageous, since multiple copies open up the possibility of paralogs and pseudogenes that do not reflect species phylogenies (Simon and Weiss 2008; Nilsson et al. 2008; Kovács et al. 2011a, b; Kiss 2012). It is assumed that the evolution of paralogs and pseudogenes in nuclear ribosomal DNA is prevented by concerted evolution (Hurst and Smith 1998; Liao 1999, 2008; Ganley and Kobayashi 2007). Yet, an increasing number of cases are being reported that suggest ITS paralogs in Fungi (Hughes and Petersen 2001; Ko and Jung 2002; Gomes et al. 2002; Smith et al. 2007; Simon and Weiss 2008; Nilsson et al. 2008; James et al. 2009; Lindner and Banik 2011; Kovács et al. 2011a, b; Li et al. 2013; Lindner et al. 2013). In plants, such cases were mostly interpreted as the result of hybridization (Hughes and Petersen 2001; Won and Renner 2005; Gulyas et al. 2005). On the other hand, it is unclear to what extent intragenomic variation detected through cloning, restriction fragment length polymorphism, and next-generation sequencing (NGS) (Gomes et al. 2002; Smith et al. 2007; Lindner and Banik 2011; Lindner et al. 2013) is real or artifactual. Critical screening of ITS Sanger sequences for selected taxa reveals that sequencing errors are frequent (Lücking et al. 2012; Moncada et al. 2013). On the other hand, approaches using specific primers for ITS pseudogenes (Li et al. 2013) leave little doubt that such pseudogenes are a reality at least in some fungal lineages.

Next-generation sequencing offers a powerful approach to the analysis of potential intragenomic variation on multicopy genes. It produces a large amount of sequence data in short time at low cost and allows for discrimination of biological entities within a single source without cloning (Margulies et al. 2005; Ahmadian et al. 2006; Rothberg and Leamon 2008; Nilsson et al. 2011). Apart from genome sequencing, these approaches are useful for analyzing microbial and fungal communities at the environmental level (O’Brien et al. 2005; Sogin et al. 2006; Geml et al. 2008; Taylor et al. 2008; Buée et al. 2009; Amend et al. 2010; Lumini et al. 2010; Wallander et al. 2010; Hibbett et al. 2011; Unterseher et al. 2011; Dai et al. 2012; McGuire et al. 2012). On the downside, NGS presents a number of problems, the most important being the huge amount of data generated, which requires approaches to data analysis that rely mostly on automated clustering (Li and Godzik 2006; Schloss et al. 2009; Edgar 2010, 2013; Caporaso et al. 2010a; Huang et al. 2010; Huse et al. 2010; Kumar et al. 2011; Nilsson et al. 2011). Since clusters and consensus or seed sequences are based on overall similarity derived from pairwise alignments, testing the quality and consistency of individual reads becomes a challenge, because potential errors are masked. One of the most problematic issues is the overestimation and misinterpretation of biological units present in a sample, be it taxa or gene copies (Quince et al. 2009; Engelbrektson et al. 2010; Kunin et al. 2010; Porter and Golding 2011; Powell et al. 2011; Unterseher et al. 2011; Zhou et al. 2011). Improved filtering algorithms have reduced the number of errors, but they cannot guarantee that sequence reads are error-free, and a large amount of reads already generated and deposited in public databases such as the sequence read archive (SRA) have not undergone rigorous error testing.

Various papers have reported and analyzed potential errors generated by 454 pyrosequencing, including duplication, chimaeras, substitution errors caused by over- and under-calls, and length variation of individual reads due to indels not present in corresponding Sanger sequences (Margulies et al. 2005; Huse et al. 2007; Quince et al. 2008; Hoff 2009; Gomez-Alvarez et al. 2009; Kunin et al. 2010; Niu et al. 2010; Tedersoo et al. 2010; Balzer et al. 2011). To filter potential sequencing errors or erroneous reads, flow values from the 454 pyrosequencing process are transformed to quality scores and used to detect low-quality reads prior to data analysis or to weight reads in clustering approaches (Huse et al. 2010). Other denoising methods use rank-abundance distribution of reads or a likelihood approach (Reeder and Knight 2010; Quince et al. 2011). Studies using duplicated 454 pyrosequencing runs from the same samples found no significant differences in the allocation of sequences to taxonomic units, suggesting that at least part of the detected variation is biological (Kauserud et al. 2012; Kumar et al. 2011). This was also supported by cloning techniques (Tedersoo et al. 2010; Kauserud et al. 2012), suggesting that 454 pyrosequencing errors are low relative to biological variation. However, the assumption that length variation in non-singleton clusters is biological, while in singletons it is artifactual (Kunin et al. 2010; Tedersoo et al. 2010; Unterseher et al. 2011), is ill-defined, since clustering methods are unable to distinguish between biological and artifactual sources of length variation, and hence cluster sizes cannot be used to categorize such variation. If the frequency of sequence reads representing separate taxa approximately reflects the abundance of the taxa themselves, with variation depending on PCR settings, it would in fact be natural to assume that a large number of biological taxa in a NGS dataset are represented by singletons only, a notion pointed out by Unterseher et al. (2011).

A major problem of 454 pyrosequencing is the accurate detection of homopolymer segments. Since these are sequenced in a single step, the only way to determine the number of bases per segment is by measuring the chemiluminescent signal strength; the stronger the signal, the more bases were added in a given step. Signal strength is thereby calibrated against an initial key of four bases. In order to produce a signal that correctly represents the homopolymer length at a given position, the fragments present in a well must be completely extended, which depends on the number of DNA templates in the well, homopolymer length, and the amount of fluorescent base molecules flooded during each cycle. In reality, the extension follows a Poisson distribution for a given well; most fragments will be fully extended, whereas a small portion will only be partially extended. This leads to a suboptimal light signal for the well which will then be interpreted as a homopolymer of shorter length (Margulies et al. 2005; Huse et al. 2007; Gomez-Alvarez et al. 2009; Kunin et al. 2010; Niu et al. 2010; Tedersoo et al. 2010; Balzer et al. 2011). In addition, the DNA templates in the well that are not fully complemented become desynchronized and will be completed during the next cycle with the same flow base. Depending on calibration, this additional signal may disappear as background noise or the sequences are filtered out, since the additional signal is below the calibrated threshold. However, if the proportion of desynchronized templates is sufficiently large, the well can produce an additional signal interpreted as an extra base call. Since flow cycles of the same base are separated by other base flow cycles, this extra base call appears several bases after is corresponding homopolymer and hence is not detectable as an error. This is known as carry-forward-incomplete-extension (CaFIE) error; it is rarely mentioned or accounted for correctly in 454 studies and usually not considered a significant source of error, and sometimes even assumed to be automatically corrected by the analytical software filtering the reads (454 Life Sciences Corporation 2009; Quince et al. 2009, 2011; Balzer et al. 2011; Gilles et al. 2011; Vandenbroucke et al. 2011; Keegan et al. 2012). Two recent studies proposed automated pipelines including corrections for CaFIE errors (RC454: Henn et al. 2012; ICC: Deng et al. 2013); however, upon testing, these either did not work properly (RC454) or did not provide a full set of corrected reads as results (ICC). Also, both methods rely on quality scores obtained from the sequencing process to perform corrections, although CaFIE errors technically cannot be detected using quality scores.

As part of a 454 study using the ITS1 region of the fungal ITS barcoding locus to detect fungal inhabitants and contamination in thalli of the lichenized basidiomycete genus Dictyonema and its allies (Lawrey et al. 2009; Dal-Forno et al. 2013), we obtained a large set of target mycobiont reads from three samples belonging to a single species, Cora inversa, but with a large amount of variation of initially undetermined origin. Clustering techniques suggested this variation to represent a large number of biological entities, whereas multiple-alignment-based phylogenetic methods using a maximum likelihood approach clustered them into a single species. To resolve this conflict, we analyzed the same samples repeatedly under different conditions, to discriminate between biological and artifactual variation and to quantify the proportion of each within the studied samples.

Materials and Methods

Both Sanger and 454 sequence data were obtained from three samples of C. inversa from Colombia (Table 1), as part of a larger set of samples run together in two sequencing runs, MTPS25 and MTPS57. The samples were targeted using fungal-specific primers for the entire ITS (Sanger) and the ITS1 region (454), the latter producing reads of a target length of 325 bases length for this taxon. The ITS was selected since it is the fungal barcoding locus (Hebert et al. 2003; Schoch et al. 2012) and since we used it successfully to address genus and species delimitation in the Dictyonema clade (Lawrey et al. 2009; Dal-Forno et al. 2013).

Table 1 Genbank and SRA and BioProject accession numbers for sequences used in this study. The SRA accession number for the entire dataset is SRP033721

Full size table

Genomic DNA was extracted from lichenized thalli using the Bio 101 Fast DNA Spin Kit for tissue (Qbiogen, Illkirch, France). About 10 ng of extracted DNA were subjected to a standard PCR in a 25 μL reaction volume using either Taq Gold polymerase (Applied Biosystems, Foster City, California, USA) or Bio-X-Act Long Mix (Bioline USA, Taunton, Massachusetts, USA). Sequence data were obtained from the ITS using the universal ITS primers ITS5 (forward) and ITS4 (reverse) for Sanger sequencing (ITS1, 5.8S, ITS2), and the primers ITS1F (forward) and ITS2 (reverse) (ITS1 region) for 454 pyrosequencing (http://www.biology.duke.edu/fungi/mycolab/primers.htm). After visualizing the PCR products with ethidium bromide on a 1 % agarose gel, the products were purified with magnetic beads (Agencourt Biosciences, Beverly, MA, USA) and used in standard sequencing reactions for Sanger sequencing with BigDye Terminator Ready Reaction Mix v3.1 (Applied Biosystems). The sequencing reactions were then purified using Sephedex G-50 (Sigma-Aldrich, St. Louis, MO, USA), dried in a speedvac, denatured in HiDi Formamide (Applied Biosystems) and run on an ABI3130-xl capillary sequencer (Applied Biosystems). The data collected were analyzed using ABI Sequence Analysis software and sequences were assembled with Sequencher v4.7 software (GeneCodes, Ann Arbor, MI, USA). Sanger ITS sequences were aligned with sequences of other species of Cora obtained from GenBank (same dataset as in Dal-Forno et al. 2013).

For 454 pyrosequencing, the samples were initially amplified with ITS primers, ITS1F (forward) and ITS2 (reverse) primers attached to a FAM label, allowing us to fingerprint the products to confirm the size and distribution of the community (Sikaroodi and Gillevet 2012). The forward and reverse primers had different custom tags and an A and B adapter, respectively (titanium adapters from the 454 pyrosequencing protocol), making it possible to pool different samples into one emulsion PCR. After confirming the PCR products on a 1 % agarose gel with ethidium bromide and fingerprinting them, the products were pooled and subjected to an emulsion PCR using the 454 emPCR_LibL kit, with varying number of cycles (see below). Pyrosequencing was done on the recovered beads from the emulsion PCR using 454 sequencing kits. The sequence results were separated into bins for different tags using a custom made perl script (Sikaroodi and Gillevet 2012).

Three different combinations of settings and conditions were used for each repeat emulsion PCR and 454 pyrosequencing run, all based on the same community PCR and hence on the same biological templates for the three samples of C. inversa: (1) initial emulsion PCR plus initial 454 pyrosequencing settings (repeat 1 in sequencing run MTPS25); (2) initial emulsion PCR plus modified 454 pyrosequencing settings (repeat 2A in sequencing run MTPS57); (3) de-novo emulsion PCR plus the same modified 454 pyrosequencing settings (repeat 2B in the same sequencing run MTPS57). The modified settings mainly concerned optimization between repeats 1 and 2A/B. Since tests showed that the suggested standard 50 cycles for emulsion PCR resulted in less reads passing the quality filters and hence less yield and higher proportion of sequening errors, we modified the settings to 45 cycles of emulsion PCR for repeats 1 and 2A (initial emulsion PCR) versus 40 cycles for repeat 2B (de-novo emulsion PCR). As a result, repeats 1 and 2A produced approximately 5 % bead recovery, with all beads loaded on the sequencing plate, whereas repeat 2B resulted in approximately 10 % bead recovery, half of which were loaded on the sequencing plate. In addition, between sequencing runs 1 (MTPS25) and 2A/B (MTPS57), the valves of the GS Junior were replaced. Since yield and proportional read quality improved consistently between repeats 1 and 2A/B (from less than 40 % to over 70 % yield), the overall quality of the sequencing results was categorized as moderate (repeat 1) and good (2A/B). As a consequence, substantial differences in the porportional frequency of read variation patterns (e.g. the same indel at the same position) detected between repeats 1 and 2A would have to be attributed to systematic sequencing errors, whereas differences between repeats 2A and 2B would be interpreted as caused by variation in the emulsion PCR settings. In contrast, patterns of real biological variation would result in comparable proportional frequencies across repeats or (slightly) increased frequency in repeats 2A/B, since it can be assumed that repeats 2A/B would yield a higher proportion of accurate reads compared to repeat 1.

Since, the 454 pyrosequencing amplifies and sequences all fungi present in the samples, the resulting reads were subjected to a filtering process to select only those ITS1 reads corresponding to C. inversa. The reads were "pre-digested:" by merging all reads from the three samples into a single fasta-formatted file. This dataset was analyzed using a combined USearch-Perl-Windows script in batch format (Online Appendix A). The script uses USearch (Edgar 2010, 2013) and the Perl script fasta-to-tab-delimited format (and viceversa) provided by the FAS Center for Systems Biology of Harvard University (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html), as well as the Windows findstr command. It arranges all reads according to length and then clusters the reads at a prespecified ID level, producing a combined fasta-formatted file containing the reference read (the longest read) for each cluster. We used 10.0, 0.1, 1.0, and 0.1 as internal and external gap opening and gap extension penalties (Edgar 2010), to minimize the formation of separate clusters based on read length rather than base sequence patterns. The ID level was set to 0.90 to minimize splitting of closely related taxa into separate clusters, while at the same time making the clusters consistent enough to allow for de-novo alignment of the contained reads. The script generates a modified fasta file of all reads, sorted by their cluster assignments from the previous step, which is aligned using the quick alignment algorithm provided by USearch StarAlign (Edgar 2010) and then converted to a tab-delimited text file, and the contained clusters are parsed out into separate fasta files. In addition to parsing the entire dataset into separate clusters at ID = 0.90, we also analyzed cluster formation of the dataset at sequential ID levels (0.50, 0.60, 0.70, 0.80, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99) to obtain the number of potential OTUs obtained at each level.

To provide taxonomic identifications of the reads and extract those representing C. inversa, each of the clusters obtained at the ID = 0.90 level was batch-blasted (i.e. with blast results for each individual read) against a curated database combined from IAB and UNITE (Kõljalg et al. 2005, 2013; Nilsson et al. 2005, 2006, 2008, 2009; Abe et al. 2011) plus a Sanger sequence database obtained from a large number of Dictyonema s.lat. samples including C. inversa. The clusters were also batch-blasted against GenBank sequences using a threefold approach: (1) including, (2) excluding uncultured/environmental sample sequences, and (3) if the best hit from (1) was an uncultured/environmental sequence that sequence was blasted again excluding uncultured/environmental sample sequences. Thus, we obtained four separate blast results for each read and its taxonomic identity was determined based on consistency of these results. The reads were then separated into (A) those blasting with C. inversa and other species of Dictyonema, (B) those blasting with closely related Basidiomycota, and (C) those blasting with other Basidiomycota and Ascomycota or lower fungi. While the reads of group (C) were not further considered, those of group (B) were subjected to a more specific local blast including only Dictyonema and closely related Basidiomycota (Agaricales) as reference sequences, to detect further reads representing Dictyonema which were then added to group (A).

All reads of group (A) were then aligned to a Sanger ITS reference alignment representing all sequenced species of Dictyonema s.lat. (including Cora; Dal-Forno et al. 2013) using PaPaRa 2.0 (Berger and Stamatakis 2011; Berger et al. 2011). The alignment was analyzed phylogenetically with a maximum likelihood approach using the RAxML-HPC BlackBox 7.3.2 on the Cipres Gateway server (Stamatakis 2006; Stamatakis et al. 2005, 2008; Miller et al. 2010), employing a GTR-Gamma model, with parametric bootstrapping using an automatically determined number of replicates as implemented by the RAxML-HPC BlackBox by means of a saturation assessment. A total of 16,665 reads was found to cluster with the Sanger reference sequences of C. inversa and these were retained for further study.

The retained 454 ITS1 reads of C. inversa were realigned with a Sanger ITS reference alignment of the same species to determine indel sites. MAFFT (Katoh et al. 2002, 2009) was unable to accurately align the length variable ITS1 reads, failing to recognize all homologous base and indel positions. Instead, accurate results were achieved with PaPaRa 2.0 (Berger and Stamatakis 2011), which aligns length variable 454 reads to a known Sanger reference alignment, introducing gaps into the reference alignment where the 454 reads contain additional base calls. For this purpose, PaPaRa uses a probabilistic gap model based on the fixed alignment and the best-scoring maximum-likelihood tree derived via RAxML (Stamatakis 2006) from that alignment. This method produces accurate alignments superior to those generated with MAFFT, and the position of insertions is easily discernible. In addition, PaPaRa allows for the generation of insertion-free alignments of the 454 reads by optionally deleting all insertion sites not present in the Sanger reference sequences. PaPaRa was also used to compute read lengths with and without insertions for each 454 read, and to calculate the relative proportion of insertions across all reads.

The "Positional Nucleotide Numerical Summary" function in BioEdit 7.0.9 (Hall 1999) was used to compute the base composition at each alignment position to determine the nature of each of the indels across all the reads overall, separated by sample and by run. Indels were classified into two major site categories, homopolymer extensions and heteropolymer insertions. The latter were further analyzed for site-specific variation among reads.

In order to test the effect of indels in the 454 ITS reads on the estimation of taxonomic diversity by means of automated clustering methods, we used another combined USearch-Perl-Windows script (Online Appendix B), which automatically computes the number of taxonomic units based on a predefined threshold similarity level by parsing out the seed sequences for each assumed taxonomic unit into a fasta-formatted file. The script was applied to the complete set of reads corresponding to C. inversa and the following threshold similarity levels were applied: 0.50, 0.60, 0.70, 0.80, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.00. We also randomly selected different numbers of reads (16, 32, 64, 125, 250, 500, 1000, 2000, 4000, 8000, 16000) and ran the script for each set to compute the relationship between read frequency and estimate of biological diversity.

Results

The total yield of the two sequencing runs MTPS25 and MTPS57 was 41,465 and 84,368 reads, respectively, for the samples corresponding the Cora and Dictyonema. After separation of reads corresponding to other species of (endolichenic) fungi or contaminants, and separating the remaining reads according to genera and species, the number of ITS1 reads for the target mycobiont of the three samples of C. inversa and three repeats (1, 2A, 2B) was 16,665, with a range of 1,269–2,465 reads per sample and repeat (Table 2). The largest number of reads was generated for sample DIC154 and repeat 2B; for all samples, repeats 2A/B had a substantially higher yield than repeat 1. The target length for the ITS1 reads according to Sanger sequences from the same samples was 325 bases. The 454 ITS1 reads varied in length from 100 bases (predetermined lower length cut-off) to 349 bases (Fig. 1), with the 22 longest reads found in repeat 1 of samples DIC150 and DIC154. The majority of the reads (88 %) was complete to near-complete, with 300–349 bases length; only 5.7 % of the reads had less than 200 bases (Fig. 1).

Table 2 Number of 454 ITS1 reads obtained for each sample of C. inversa in each of the runs

Full size table

The number of insertion sites per read varied from 0 to 25, with most of the larger values found in reads of repeat 1 of samples DIC150 and DIC154; however, the largest value was found in a read of repeat 1 of sample DIC149 (Table 3). On average, reads from repeats 2A/B had a significantly lower number of insertions (0.20–0.44 per sample) than those from repeat 1 (2.30–3.33 per sample; Kruskal–Wallis ANOVA, p < 0.001). Relative to read length, this difference was even more pronounced, with 0.07–0.15 % (range 0–5.37 %) in reads of repeats 2A/B, and 0.84–1.23 % (range 0–14.4 %) in reads of repeat 1 (Table 4). While some reads had up to 14.4 % of their bases corresponding to insertions, the large majority of reads even in repeat 1 had less than 1 % of insertion sites, corresponding to about 2–3 bases per read.

Table 3 Number of insertions (mean and range) per read obtained for each sample of C. inversa in each of the runs

Full size table

Table 4 Relative proportion of insertions (mean and range) per read (relative to total read length including insertions) obtained for each sample of C. inversa in each of the runs

Full size table

The total length of the entire PaPaRa alignment of all 16,665 454 ITS1 reads plus nine corresponding Sanger reference sequences was 591 bases, including 325 sites corresponding to the Sanger reference sequences and 266 insertion sites obtained from the 454 ITS1 reads (Online Appendix C). Thus, 45 % of the PaPaRa alignment represented insertion sites. Of the 591 sites, positions 1–15 corresponded to the nuSSU primer and positions 505–591 to the highly conserved 5.8S region including the 5.8S primer. The insertion sites could be classified into three categories: (1) homopolymer extensions; (2) heteropolymer insertions; and (3) mixed sites which, depending on the base call in individual reads, can be interpreted as either homopolymer extensions or heteromer insertions (Online Appendix D). A total of 115 positions, five corresponding to the ITS5 forward primer and ten to the 5.8S region, were classified as homopolymer extensions; 47 sites represented heteropolymer insertions (one in the primer and seven in the 5.8S region); 52 sites were mixed sites which could not be unambiguously classified as either homopolymer extensions or heteropolymer insertions (one in the ITS5 forward primer and seven in the 5.8S region); and 53 positions had insertions in no more than two reads and were classified as random (one in the ITS5 forward primer and 15 in the 5.8S region). Since homopolymer extensions can be explained by incorrectly calibrated chemiluminescent sensor sensitivity (base overcall or undercall), we focused further analysis on heteropolymer insertions. Of the 47 heteropolymer insertion sites, 31 were present in ten or more reads, providing a sound basis for statistical analysis (Table 5). A further 13 were present in three or more (but less than ten) reads. All of these 44 insertion sites had either a single, uniform base call in all reads across all three samples and repeats (31 sites) or a strongly dominating base call with a single exception in no more than one or two reads (13 sites; Online Appendix D). This implies that the nature of the insertion was practically the same in all reads across all samples and repeats for a given site, independent of the emulsion PCR and sequencing parameters, as well as independent of the well position corresponding to individual reads. This pattern was found to be highly significantly different from random (Chi Square test; p < 0.001).

Table 5 Summary of 31 heteropolymer insertion sites present in ten or more reads, sorted in descending order by number of reads in which insertion was present

Full size table

Heteropolymer insertions were most frequently found in repeat 1 (all three samples), with sample DIC150 having the highest proportion of reads with insertions present (Table 5). In most cases, the proportion of reads with insertions was significantly lower in repeats 2A/B, although the same insertions were found in these reads across all samples. In some cases, insertions were about equally distributed across reads from all samples and repeats (positions 174, 180, 259, 389, 424, 455), and in a few others, they were more frequent in reads from repeats 2A/B (positions 264, 297, 515, 547). Overall, we found 15,654 insertions in the 16,665 reads (0.94 per read on average), 8,066 of which corresponded to homopolymer extensions, and 3,437 to heteropolymer insertions; 4,053 corresponded to mixed sites (both homopolymer extensions and heteropolymer insertions) and 106 to random sites. Of the 3,437 heteropolymer insertions, 3,368 (98 %) corresponded to the 31 insertion sites present in ten ore more reads. Without exception, all of these occurred 1–3 bases after a homopolymer segment of the same base call, corresponding to partial deletions in the homopolymer segment and hence representing CaFIE errors. The remaining heteropolymer insertions, as well as those in mixed sites, showed the same patterns, even in those cases where insertions were equal in frequency between repeats or more frequent in repeats 2A/B (total of 263 out of 3,368). Thus, of the total of 15,654 insertions, up to 15,548 (99.3 %) could be attributed to either homopolymer extension or CaFIE errors.

In addition to insertions, we also detected 3,279 substitutions in 187 sites aligned to those present in the Sanger references sequences, corresponding to 0.06 % of all base calls. The number of reads with substitutions ranged from 1 to 806, with 184 sites having 31 or less reads with substitutions (less than 0.2 % of all reads) and three sites having a larger number of reads with substitutions (124, 129, 806). A total of 118 substitution sites had 2,839 substitutions. These were highly consistent across all reads for each position and apparently not artifacts, since 2,677 substitutions (94 %) exhibited exactly the same base call independent of read and run. Inspection of individual reads showed that they might in part represent masked CaFIE errors, incorrectly aligned with non-homologous sites in the Sanger reference sequences due to adjacent deletions; however, we found no significant differences in relative frequency of these substitutions among repeats, suggesting that they might also represent real biological substitutions. For example, the site with the largest number of reads with substitutions, 806 out of 16,665 (4.8 %) at position 125, invariably had G in the Sanger reference sequences and A in 805 out of 806 reads with substititions: 224 out of 4,278 reads in repeat 1 (5.2 %), 301 out of 6,091 reads in repeat 2A (4.9 %), and 280 out of 6,296 reads in repeat 2B (4.4 %). The next two positions had much fewer reads with substitutions (129 and 124, respectively, or 0.74–0.77 %), but with similarly consistent substitution patterns, replacing G with A (position 193) and A with T (position 256).

The number of OTUs present in the dataset of 16,665 reads was estimated between 3 and 5,945 depending on the threshold level, with 37 at 95 % (Table 6). After correcting the reads using PaPaRa (insertions automatically removed), the corresponding estimates were between 1 and 3,425, with 4 at 95 % (Table 6). Since, phylogenetic analysis demonstrated that all reads belonged to C. inversa, the application of the cluster analysis on the original reads resulted in a 11-fold, 35-fold, and 137-fold overestimation of biological diversity using the commonly applied threshold ID levels of 90, 95, and 97 %. For the corrected reads (insertions removed but substitutions retained), biological diversity was estimated correctly up to 80 % ID level, whereas at the 90, 95, and 97 % ID level, overestimation by the factor 2, 4, and 8 occurred (due to the presence of substitutions). This represented a 5–16-fold improvement compared to uncorrected reads.

Table 6 Number of assumed biological units derived from the complete set of 454 ITS1 reads of C. inversa resulting from the three samples and runs, using the original reads and reads after removal of insertions with PaPaRa 2.0 (Berger and Stamatakis 2011)

Full size table

There was a highly significant linear correlation between read frequency and estimate of biological diversity at all threshold levels, with R = 0.86 at 90 %, R = 0.94 at 95 %, and R = 0.96 at 97–100 % ID threshold level (Table 7). For up to 16 reads, threshold levels of up to 97 % predicted the biological diversity correctly, but this value dropped substantially with higher read frequency (Fig. 2). Using 95 % threshold, 32–125 reads overestimated the biological diversity twofold, and at 90 %, more than 1,000 reads overestimated the biological diversity at least threefold. At 1,000 reads, only a threshold level of 80 % or lower predicted the biological diversity correctly.

Table 7 Spearman rank correlation between read frequency and estimate of biological diversity (see Fig. 2)

Full size table

Discussion

Our findings of substantial amount of variation in 454 pyrosequencing data of the lichenized fungus C. inversa at first glance appears in line with an increasing number of NGS studies report intragenomic variation in the ITS and other portions of the nuclear rDNA cistron (Lindner et al. 2013). However, in contrast to other studies, we found no evidence for underlying biological causes for most of this variation. Instead, while potentially real biological substitutions accounted for only 0.06 % of all corresponding base calls, we attribute more than 99 % of the observed insertions to either homopolymer extension or CaFIE errors (Margulies et al. 2005; Rothberg and Leamon 2008; 454 Life Sciences Corporation 2009; Balzer et al. 2011; Gilles et al. 2011; Vandenbroucke et al. 2011).

Technically, CaFIE errors cannot be distinguished from biological variation by inspecting individual reads, since both represent systematic positional variation, with the same insertion consistently appearing at the same position in different reads. CaFIE errors thus mimic indel variation found in Sanger sequences between closely related species (Dal-Forno et al. 2013). Automated denoising methods (Quince et al. 2009, 2011; Caporaso et al. 2010a; Reeder and Knight 2010) cannot adequately detect these errors, since they focus on signal flow variation of homopolymer segments, but do not take into account associated heteropolymer insertions. Another problem is that CaFIE errors might be masked as substitution errors if reads that contain both insertions and deletions are not properly aligned; this might explain why some studies on sequence errors report very low proportions of indels versus substitutions (Keegan et al. 2012). The difficulty of assessing CaFIE errors might be the reason why previous authors considered them of little significance compared to other potential error sources (Margulies et al. 2005; Rothberg and Leamon 2008; 454 Life Sciences Corporation 2009; Balzer et al. 2011; Gilles et al. 2011; Vandenbroucke et al. 2011). While, CaFIE errors can be theoretically predicted by comparing heteropolymer sequence patterns immediately following homopolymer segments, the only way of actually detecting such errors is the comparison of separate sequencing runs of the same community PCR samples, which permits assessing frequency patterns: if the variation is largely biological, one would expect comparable proportions of positional variation in separate runs, as was here observed for substitutions and a small proportion of insertions. In addition, it is necessary to analyze a larger number of reads from a given sample and species and compare reads with Sanger sequences from the same samples, to accurately recognize erroneous insertion sites.

The previous studies using duplicated 454 pyrosequencing runs from the same samples found no significant differences in the allocation of sequences to taxonomic units, suggesting that at least part of the detected variation was biological and not artifactual (Kauserud et al. 2012; Kumar et al. 2011). However, these authors attempted to replicate the same sequencing conditions and used a clustering approach to detect biological units, rather than analyzing and comparing individual reads by means of multiple fixed alignments. Sequencing of environmental samples (O’Brien et al. 2005; Sogin et al. 2006; Geml et al. 2008; Taylor et al. 2008; Buée et al. 2009; Amend et al. 2010; Lumini et al. 2010; Wallander et al. 2010; Hibbett et al. 2011; Dai et al. 2012; McGuire et al. 2012) usually results in a low number of redundant read copies for each taxon present in the sample, which makes the detection of CaFIE errors difficult. In addition, clustering techniques will interpret CaFIE errors as real variation, especially if they use the longest reads as seeds, such as USearch (Edgar 2010, 2013), since the longest reads will likely have the largest number of erroneous insertions.

In the present case, automated clustering of reads representing a single species led to a 35-fold overestimation of biological diversity at the commonly applied threshold ID level of 0.95. This finding is alarming, since automated clustering will invariably interpret all variation present in a sample as biological. The problem is especially acute for studies of environmental samples, where species are not known and cannot be referenced against the existing Sanger sequences. Unfortunately, clustering techniques are indispensable to pre-digest large amounts of data (Li and Godzik 2006; Schloss et al. 2009; Caporaso et al. 2010a; Edgar 2010, 2013; Huang et al. 2010; Kumar et al. 2011), especially if the target gene is highly variable and unalignable across the range of analyzed organisms, such as ITS. The problem of overestimation of biological diversity using clustering methods has been discussed in several studies (Quince et al. 2009; Engelbrektson et al. 2010; Kunin et al. 2010; Powell et al. 2011; Unterseher et al. 2011; Zhou et al. 2011), but the error is usually believed not to exceed a factor of ten, and is often considered to be much lower, contrasting with the results presented here. The overestimation of biological diversity through clustering increases with read frequency per taxon; hence, species represented by few reads in a sample are likely to be correctly recognized, whereas species with a large number of reads tend to be interpreted as more than one taxon. Our analysis suggests that up to 16 reads per taxon will predict biological diversity correctly up to a relatively high threshold level of 97 %, but accuracy rapidly declines in linear fashion with increasing read frequency (cluster size) per taxon. Hence, this problem will have low impact in datasets with a rather balanced distribution of (few) reads per taxon, but potentially high impact in datasets which contain few to many dominant taxa with large numbers of reads each. Without removal of CaFIE errors, the only way to accurately analyze such data is to employ phylogenetic approaches instead of clustering (Porter and Golding 2011; Powell et al. 2011), including alignment tools such as PyNAST and PaPaRa (Caporaso et al. 2010b; Berger and Stamatakis 2011). Our results also imply that, contrary to what has been suggested in other studies (Kunin et al. 2010; Tedersoo et al. 2010; Unterseher et al. 2011), singleton reads should not be excluded as potential artifacts but instead contribute substantially to accurate estimates of taxonomic diversity, since sequencing errors in singletons have no adverse effect in diversity estimates based on clustering methods. Thus, while the previous studies suggested to exclude singletons and focus on clusters to estimate diversity, we conclude that singletons should by default considered to represent real biological diversity, whereas clusters should be carefully examined for effects of sequencing errors by analyzing them with alignment-based phylogenetic methods.

The huge amount of NGS data to be dealt with poses a challenge to phylogenetic approaches, but this can be surmounted by using clustering and blasting techniques to pre-digest the data and then applying alignment-based phylogenetic analysis on portions of the data to determine biological units. Phylogenetic analysis using maximum likelihood can now be applied to alignments containing thousands of sequences by using RAxML on public multicore servers (Stamatakis 2006; Stamatakis et al. 2005, 2008; Miller et al. 2010). The alternative approach through removal of sequencing errors using PaPaRa prior to clustering considerably improves the results of clustering compared to original reads; yet, clustering will always only provide a proxy for real diversity, since a predefined threshold ID level has to be applied to make data comparable, whereas a phylogenetic approach is independent of preset ID levels. Phylogenetic methods that provide simultaneous search options for the best alignment-tree combination, such as POY, BaliPhy, and SATé (Suchard and Redelings 2006; Liu et al. 2009, 2012; Varón et al. 2010), are potentially valuable methods to analyze widely divergent sequences in a sample without the necessity of a multiple fixed alignment (Wang et al. 2011), but these methods are computationally expensive and currently cannot handle very large datasets of tens to hundreds of thousands of reads as the result of typical 454 pyrosequencing runs.

The interpretation of most of the variation found here as 454 pyrosequencing errors, with only a small proportion of substitutions apparently reflecting biological variation, is in accordance with studies that reported intragenomic ITS as comparatively uniform (Morzycka-Wroblewska et al. 1985; Borsuk et al. 1988; Ganley and Scott 2002; Ganley and Kobayashi 2007). Such uniformity is assumed to be due either to concerted evolution (Liao 1999, 2008) or to a birth-and-death mechanism with purifying selection (Morzycka-Wroblewska et al. 1985; Borsuk et al. 1988; Ganley and Scott 2002; Rooney and Ward 2005; Ganley and Kobayashi 2011). Other studies suggest intragenomic variation in fungal ITS and the nuclear rDNA (Hughes and Petersen 2001; Ko and Jung 2002; Gomes et al. 2002; Pawlowska 2005, 2007, 2010; Rooney and Ward 2005; Smith et al. 2007; Simon and Weiss 2008; Nilsson et al. 2008; James et al. 2009; Lindner and Banik 2011; Kovács et al. 2011a, b; Li et al. 2013; Lindner et al. 2013). In a large-scale analysis of Sanger ITS sequences of the lichenized fungal family Physciaceae, Lücking et al. (2012) found evidence for rDNA (ITS) pseudogenes in a single clade, whereas all other clades were consistent with a single, functional rDNA cistron. Therefore, the lack of evidence for intragenomic variation in some studies and its presence in others suggests that pseudogene evolution is specific to individual lineages, even within the same, larger clades. Thus, the existence of pseudogenes will have to be analyzed for each clade separately, which poses a challenge for the use of ITS as fungal barcoding locus (Hebert et al. 2003; Schoch et al. 2012), since pseudogenes can form novel clades that mimic cryptic speciation. Our method would be a viable approach to resolve this problem for any given clade.

Quantitative analysis of 454 pyrosequencing data for single clades or taxa can also shed light on the nature of potential variation and how such variation might affect phylogenetic studies using ITS generated from Sanger sequencing as barcoding locus. In the present case, assuming that substitutions (not indels) detected in the dataset reflect real biological variation, their very low proportion (0.06 % of all base calls) will cause them to disappear as background signal in Sanger sequencing. This is supported by the Sanger sequences obtained from the same samples used in this study, which do not contain any variation. Sanger sequencing thus acts like a filter, which is of advantage when defining phylogeny-based taxa, but suggests genomic uniformity when in fact some level of intragenomic variation exists. The low level of variation found here would suggest that a small portion of the rDNA (ITS) copies in the genome of C. inversa "escaped" concerted evolution. However, under a model of concerted evolution or birth-and-death mechanism with purifying selection (Morzycka-Wroblewska et al. 1985; Borsuk et al. 1988; Liao 1999, 2008; Ganley and Scott 2002; Rooney and Ward 2005; Ganley and Kobayashi 2007; Ganley and Kobayashi 2011), one would expect that, at any given time, a small proportion of the gene copies to deviate from a uniform pattern. Hence, small amounts of variation do not contradict, but are consistent with, concerted evolution.

References

454 Life Sciences Corporation (2009) Genome sequencer FLX system software manual, version 2.3. 454 Life Sciences Corp., A Roche Company, Brandford, CT
Google Scholar
Abe T, Ikemura T, Sugahara J, Kanai A, Ohara Y, Uehara H, Kinouchi M, Kanaya S, Yamada Y, Muto A, Inokuchi I (2011) tRNADB-CE 2011: tRNA gene database curated manually by experts. Nucleic Acids Res 39:D210–D213
Article PubMed Central CAS PubMed Google Scholar
Ahmadian A, Ehn M, Hober S (2006) Pyrosequencing: history, biochemistry and future. Clin Chim Acta 363:83–94
Article CAS PubMed Google Scholar
Amend A, Samson R, Seifert K, Bruns T (2010) Deep sequencing reveals diverse and geographically structured assemblages of fungi in indoor dust. Proc Natl Acad Sci USA 107:13748–13753
Article PubMed Central CAS PubMed Google Scholar
Balzer S, Malde K, Jonassen I (2011) Systematic exploration of error sources in pyrosequencing flowgram data. Bioinformatics 27:304–309
Article CAS Google Scholar
Begerow D, Nilsson H, Unterseher M, Maier W (2010) Current state and perspectives of fungal DNA barcoding and rapid identification procedures. Appl Microbiol Biotechnol 87:99–108
Article CAS PubMed Google Scholar
Berger SA, Stamatakis A (2011) Aligning short reads to reference alignments and trees. Bioinformatics 27:2068–2075
Article CAS PubMed Google Scholar
Berger SA, Krompass D, Stamatakis A (2011) Performance, accuracy and web-server for evolutionary placement of short sequence reads under maximum-likelihood. Syst Biol 60:291–302
Article PubMed Central PubMed Google Scholar
Borsuk P, Gniadkowski M, Bartnik E, Stepien PP (1988) Unusual evolutionary conservation of 5S rRNA pseudogenes in Aspergillus nidulans: similarity of the DNA sequence associated with the pseudogenes with the mouse immunoglobulin switch region. J Mol Evol 28:125–130
Article CAS PubMed Google Scholar
Buée M, Reich M, Murat C, Nilsson RH, Uroz S, Martin F (2009) 454 Pyrosequencing analyzes of forest soils reveal an unexpectedly high fungal diversity. New Phytol 184:449–456
Article PubMed CAS Google Scholar
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R (2010a) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336
Article PubMed Central CAS PubMed Google Scholar
Caporaso JG, Bittinger K, Bushman FD, DeSantis TZ, Andersen GL, Knight R (2010b) PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26:266–267
Article PubMed Central CAS PubMed Google Scholar
Dai M, Hamel C, Arnaud M, He Y, Grant C, Lupwayi N, Janzen H, Malhi SS, Yang X, Zhou Z (2012) Arbuscular mycorrhizal fungi assemblages in Chernozem great groups revealed by massively parallel pyrosequencing. Can J Microbiol 58:81–92
Article CAS PubMed Google Scholar
Dal-Forno M, Lawrey JD, Sikaroodi M, Bhattarai S, Gillevet PM, Sulzbacher M, Lücking R (2013) Starting from scratch: evolution of the lichen thallus in the basidiolichen Dictyonema (Agaricales: Hygrophoraceae). Fungal Biol 117(9):584–598
Article CAS PubMed Google Scholar
Del-Prado R, Cubas P, Lumbsch HT, Divakar PK, Blanco O, de Paz GA, Molina MC, Crespo A (2010) Genetic distances within and among species in monophyletic lineages of Parmeliaceae (Ascomycota) as a tool for taxon delimitation. Mol Phylogenet Evol 56:125–133
Article CAS PubMed Google Scholar
Deng W, Maust BS, Westfall DH, Chen L, Zhao H, Larsen BB, Iyer S, Liu Y, Mullins JI (2013) Indel and carryforward correction (ICC): a new analysis approach for processing 454 pyrosequencing data. Bioinformatics 29(19):2402–2409. doi:10.1093/bioinformatics/btt434
Article PubMed Central CAS PubMed Google Scholar
Eberhardt U (2010) A constructive step towards selecting a DNA barcode for fungi. New Phytol 187:265–268
Article CAS PubMed Google Scholar
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461
Article CAS PubMed Google Scholar
Edgar RC (2013) UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods 10(10):996–998. doi:10.1038/nmeth.2604
Article CAS PubMed Google Scholar
Engelbrektson A, Kunin V, Wrighton KC, Zvenigorodsky N, Chen F, Ochman H, Hugenholtz P (2010) Experimental factors affecting PCR-based estimates of microbial species richness and evenness. ISME J 4:642–647
Article CAS PubMed Google Scholar
Ganley ARD, Kobayashi T (2007) Highly efficient concerted evolution in the ribosomal DNA repeats: total rDNA repeat variation revealed by whole-genome shotgun sequence data. Genome Res 17:184–191
Article PubMed Central CAS PubMed Google Scholar
Ganley ARD, Kobayashi T (2011) Monitoring the rate and dynamics of concerted evolution in the ribosomal DNA repeats of Saccharomyces cerevisiae using experimental evolution. Mol Biol Evol 28:2883–2891
Article CAS PubMed Google Scholar
Ganley AR, Scott B (2002) Concerted evolution in the ribosomal RNA genes of an Epichloë endophyte hybrid: comparison between tandemly arranged rDNA and dispersed 5S rrn genes. Fungal Genet Biol 35:39–51
Article CAS PubMed Google Scholar
Geml J, Laursen GA, Taylor DL (2008) Molecular diversity assessment of arctic and boreal Agaricus taxa. Mycologia 100:577–589
Article PubMed Google Scholar
Gilles A, Meglécz E, Pech N, Ferreira S, Malausa T, Martin JF (2011) Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics 12:245
Article PubMed Central PubMed Google Scholar
Gomes EA, Kasuya MCM, de Barros EG, Borges AC, Araujo EF (2002) Polymorphism in the internal transcribed spacer (ITS) of the ribosomal DNA of 26 isolates of ectomycorrhizal fungi. Genet Mol Biol 25:477–483
Article CAS Google Scholar
Gomez-Alvarez V, Teal TK, Schmidt TM (2009) Systematic artifacts in metagenomes from complex microbial communities. ISME J 3:1314–1317
Article PubMed Google Scholar
Gulyas G, Sramko G, Molnar A, Rudnoy S, Illyes Z, Balazs T, Zoltan B (2005) Nuclear ribosomal DNA ITS paralogs as evidence of recent hybridization in genus Ophrys (Orchidaceae). Acta Biol Crac Ser Bot 47:61–67
Google Scholar
Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98
CAS Google Scholar
Hebert PD, Ratnasingham S, deWaard JR (2003) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc Biol Sci 270(Suppl 1):S96–S99
Article PubMed Central CAS PubMed Google Scholar
Henn MR, Boutwell CL, Charlebois P, Lennon NJ, Power KA, Macalalad AR, Berlin AM, Malboeuf CM, Ryan EM, Gnerre S, Zody MC, Erlich RL, Green LM, Berical A, Wang Y, Casali M, Steeck H, Bloom AK, Dudek T, Tully D, Newman R, Axten KL, Gladden AD, Battis L, Kemper M, Zeng Q, Shea TP, Gujja S, Zedlack C, Gasser O, Brander C, Hess C, Gunthard HF, Brumme ZL, Brumme CJ, Bazner S, Rychert J, Tinsley JP, Mayer KH, Rosenberg E, Pereya F, Levin JZ, Young SK, Jessen H, Altfeld M, Birren BW, Walker BD, Allen TM (2012) Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog 8(3):e1002529
Article PubMed Central CAS PubMed Google Scholar
Hibbett DS, Ohman A, Glotzer D, Nuhn M, Kirk PM, Nilsson RH (2011) Progress in molecular and morphological taxon discovery in fungi and options for formal classification of environmental sequences. Fungal Biol Rev 25:38–47
Article Google Scholar
Hoff KJ (2009) The effect of sequencing errors on metagenomic gene prediction. BMC Genomics 10:520
Article PubMed Central PubMed CAS Google Scholar
Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680
Article PubMed Central CAS PubMed Google Scholar
Hughes KW, Petersen RH (2001) Apparent recombination or gene conversion in the ribosomal ITS region of a Flammulina (Fungi, Agaricales) hybrid. Mol Biol Evol 18:94–96
Article CAS PubMed Google Scholar
Hurst LD, Smith NGC (1998) The evolution of concerted evolution. Proc Biol Sci 265:121–127
Article PubMed Central Google Scholar
Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8:R143
Article PubMed Central PubMed CAS Google Scholar
Huse SM, Welch DM, Morrison HG, Sogin ML (2010) Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol 12:1889–1898
Article PubMed Central CAS PubMed Google Scholar
James SA, O’Kelly MJ, Carter DM, Davey RP, van Oudenaarden A, Roberts IN (2009) Repetitive sequence variation and dynamics in the ribosomal DNA array of Saccharomyces cerevisiae as revealed by whole-genome resequencing. Genome Res 19:626–635
Article PubMed Central CAS PubMed Google Scholar
Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res 30:3059–3066
Article PubMed Central CAS PubMed Google Scholar
Katoh K, Asimenos G, Toh H (2009) Multiple alignment of DNA sequences with MAFFT. Meth Mol Biol 537:39–64
Article CAS Google Scholar
Kauserud H, Kumar S, Brysting A, Nordén J, Carlsen T (2012) High consistency between replicate 454 pyrosequencing analyses of ectomycorrhizal plant root samples. Mycorrhiza 22:309–315
Article PubMed Google Scholar
Keegan KP, Trimble WL, Wilkening J, Wilke A, Harrison T, D’Souza M, Meyer F (2012) A platform-independent method for detecting errors in metagenomic sequencing data: dRISEE. PLoS Comput Biol 8(6):e1002541
Article PubMed Central CAS PubMed Google Scholar
Kiss L (2012) Limits of nuclear ribosomal DNA internal transcribed spacer (ITS) sequences as species barcodes for Fungi. Proc Natl Acad Sci USA 109:E1811
Article PubMed Central CAS PubMed Google Scholar
Ko KS, Jung HS (2002) Three nonorthologous ITS1 types are present in a polypore fungus Trichaptum abietinum. Mol Phylogenet Evol 23:112–122
Article CAS PubMed Google Scholar
Kõljalg U, Larsson KH, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E, Pennanen T, Sen R, Taylor AF, Tedersoo L, Vrålstad T, Ursing BM (2005) UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol 166:1063–1068
Article PubMed CAS Google Scholar
Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TD, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny B, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Oldmaa KP, Saag L, Saar R, Schüssler A, Scott JA, Senés C, Smith ME, Suija A, Taylor DL, Telleria MT, Weiss M, Larsson KH (2013) Towards a unified paradigm for sequence-based identification of fungi. Mol Ecol 22:5271–5277. doi:10.1111/mec.12481
Article PubMed CAS Google Scholar
Kovács GM, Balázs TK, Calonge FD, Martín MP (2011a) The diversity of Terfezia desert truffles: new species and a highly variable species complex with intrasporocarpic nrDNA ITS heterogeneity. Mycologia 103:841–853
Article PubMed Google Scholar
Kovács GM, Jankovics T, Kiss L (2011b) Variation in the nrDNA ITS sequences of some powdery mildew species: do routine molecular identification procedures hide valuable information? Eur J Plant Pathol 131:135–141
Article Google Scholar
Kumar S, Carlsen T, Mevik B, Enger P, Blaalid R, Shalchian-Tabrizi K, Kauserud H (2011) CLOTU: an online pipeline for processing and clustering of 454 amplicon reads into OTUs followed by taxonomic annotation. BMC Bioinformatics 12:82
Article Google Scholar
Kunin V, Engelbrektson A, Ochman H, Hugenholtz P (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123
Article CAS PubMed Google Scholar
Lawrey JD, Lücking R, Sipman HJM, Chaves JL, Redhead SA, Bungartz F, Sikaroodi M, Gillevet PM (2009) High concentration of basidiolichens in a single family of agaricoid mushrooms. Mycol Res 113:1154–1171
Article CAS PubMed Google Scholar
Li W, Godzik A (2006) Cd-Hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659
Article CAS PubMed Google Scholar
Li Y, Jiao L, Yao YJ (2013) Non-concerted ITS evolution in fungi, as revealed from the important medicinal fungus Ophiocordyceps sinensis. Mol Phylogen Evol 68(2):373–379
Article Google Scholar
Liao D (1999) Concerted evolution: molecular mechanism and biological implications. Am J Hum Genet 64:24–30
Article PubMed Central CAS PubMed Google Scholar
Liao D (2008) Concerted evolution. University of Florida, Gainesville, FL. doi:10.1002/9780470015902.a0005132
Lindner DL, Banik MT (2011) Intragenomic variation in the ITS rDNA region obscures phylogenetic relationships and inflates estimates of operational taxonomic units in genus Laetiporus. Mycologia 103:731–740
Article PubMed Google Scholar
Lindner DL, Carlsen T, Nilsson RH, Davey M, Schumacher T, Kauserud H (2013) Employing 454 amplicon pyrosequencing to reveal intragenomic divergence in the internal transcribed spacer (ITS) rDNA region in fungi. Ecol Evol 3:1751–1764
Article PubMed Central PubMed Google Scholar
Liu K, Raghavan S, Nelesen S, Linder CR, Warrnow T (2009) Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324:1561–1564
Article CAS PubMed Google Scholar
Liu K, Warnow TJ, Holder MT, Nelesen S, Yu J, Stamatakis A, Linder CR (2012) SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic Trees. Syst Biol 61:90–106
Article PubMed Google Scholar
Lücking R, Kalb K, Essene A (2012) The power of ITS: using megaphylogenies of barcoding genes to reveal inconsistencies in taxonomic identifications of genbank submissions. The 7th IAL symposium “Lichens: from genome to ecosystems in a changing world”, January 2012, Bangkok. Book of abstracts: 3B-1-O2
Lumini E, Orgiazzi A, Borriello R, Bonfante P, Bianciotto V (2010) Disclosing arbuscular mycorrhizal fungal biodiversity in soil through a land-use gradient using a pyrosequencing approach. Environ Microbiol 12:2165–2179
CAS PubMed Google Scholar
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM (2005) Genome sequencing inmicrofabricated high-density picolitre reactors. Nature 437:376–380
PubMed Central CAS PubMed Google Scholar
McGuire KL, Fierer N, Bateman C, Treseder KK, Turner BL (2012) Fungal community composition in neotropical rain forests: the influence of tree diversity and precipitation. Microb Ecol 63:804–812
Article PubMed Google Scholar
Miller MA, Pfeiffer W, Schwartz T (2010) Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In: Proceedings of the gateway computing environments workshop (GCE), New Orleans, pp 1–8
Moncada B, Lücking R, Suárez A (2013) Molecular phylogeny of the genus Sticta (lichenized Ascomycota: Lobariaceae) in Colombia. Fungal Div. doi:10.1007/s13225-013-0230-0
Google Scholar
Morzycka-Wroblewska E, Selker EU, Stevens JN, Metzenberg RL (1985) Concerted evolution of dispersed Neurospora crassa 5S RNA genes: pattern of sequence conservation between allelic and nonallelic genes. Mol Cell Biol 5:46–51
Article PubMed Central CAS PubMed Google Scholar
Nilsson RH, Kristiansson E, Ryberg M, Larsson KH (2005) Approaching the taxonomic affiliation of unidentified sequences in public databases—an example from the mycorrhizal fungi. BMC Bioinformatics 6:178
Article PubMed Central PubMed CAS Google Scholar
Nilsson RH, Ryberg M, Kristiansson E, Abarenkov K, Larsson KH (2006) Taxonomic reliability of DNA sequences in public sequence databases: a fungal perspective. PLoS ONE 1:e59
Article PubMed Central PubMed CAS Google Scholar
Nilsson RH, Kristiansson E, Ryberg M, Hallenberg N, Larsson KH (2008) Intraspecific ITS variability in the kingdom Fungi as expressed in the international sequence databases and its implications for molecular species identification. Evolut Bioinformatics 4:193–201
Google Scholar
Nilsson RH, Ryberg M, Abarenkov K, Sjökvist E, Kristiansson E (2009) The ITS region as a target for characterization of fungal communities using emerging sequencing technologies. FEMS Microbiol Lett 296:97–101
Article CAS PubMed Google Scholar
Nilsson RH, Tedersoo L, Lindahl BD, Kjøller R, Carlsen T, Quince C, Abarenkov K, Pennanen T, Stenlid J, Bruns T, Larsson K-H, Kõljalg U, Kauserud H (2011) Towards standardization of the description and publication of next-generation sequencing datasets of fungal communities. New Phytol 191:314–318
Article Google Scholar
Niu B, Fu L, Sun S, Li W (2010) Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 11:187
Article PubMed Central PubMed CAS Google Scholar
O’Brien HE, Parrent JL, Jackson JA, Moncalvo JM, Vilgalys R (2005) Fungal community analysis by large-scale sequencing of environmental samples. Appl Environ Microbiol 71:5544–5550
Article PubMed Central PubMed CAS Google Scholar
Pawlowska TE (2005) Genetic processes in arbuscular mycorrhizal fungi. FEMS Microbiol Lett 251:185–192
Article CAS PubMed Google Scholar
Pawlowska TE (2007) How the genome is organized in the Glomeromycota. In: Heitman J, Kronstad JW, Taylor JW, Casselton LA (eds) Sex in Fungi: molecular determination and evolutionary implications. ASM Press, Washington,DC, pp 419–430
Chapter Google Scholar
Pawlowska TE (2010) Population genetics of fungal mutualists of plants. In: Xu J (ed) Microbial population genetics. Horizon Scientific Press, Norfolk
Google Scholar
Pinto PM, Resende MA, Koga-Ito CY, Ferreira JA, Tendler M (2004) rDNA-RFLP identification of Candida species in immunocompromised and seriously diseased patients. Can J Microbiol 50:504–520
Article Google Scholar
Porter TM, Golding GB (2011) Are similarity- or phylogeny-based methods more appropriate for classifying internal transcribed spacer (ITS) metagenomic amplicons? New Phytol 192:775–782
Article CAS PubMed Google Scholar
Powell JR, Monaghan MT, Opik M, Rillig MC (2011) Evolutionary criteria outperform operational approaches in producing ecologically relevant fungal species inventories. Mol Ecol 20:655–666
Article PubMed Google Scholar
Quince C, Curtis TP, Sloan WT (2008) The rational exploration of microbial diversity. ISME J 2:997–1006
Article CAS PubMed Google Scholar
Quince C, Lanzén A, Curtis TP, Davenport RJ, Hall N, Head IM, Read LF, Sloan WT (2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods 6:639–641
Article CAS PubMed Google Scholar
Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ (2011) Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12:38
Article PubMed Central PubMed Google Scholar
Reeder J, Knight R (2010) Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Methods 7:668–669
Article PubMed Central CAS PubMed Google Scholar
Rooney AP, Ward TJ (2005) Evolution of a large ribosomal RNA multigene family in filamentous fungi: birth and death of a concerted evolution paradigm. Proc Natl Acad Sci USA 102:5084–5089
Article PubMed Central CAS PubMed Google Scholar
Rossman AY (2007) Report of the planning workshop for all fungi DNA Barcoding. Inoculum 58:1–5
Google Scholar
Rothberg JM, Leamon JH (2008) The development and impact of 454 pyrosequencing. Nat Biotechnol 26:1117–1124
Article CAS PubMed Google Scholar
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541
Article PubMed Central CAS PubMed Google Scholar
Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Fungal Barcoding Consortium (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc Natl Acad Sci USA 109:6241–6246
Article PubMed Central CAS PubMed Google Scholar
Seifert KA (2008) The all-fungi barcoding campaign (FunBOL). Persoonia 20:106
Google Scholar
Sikaroodi M, Gillevet PM (2012) Quality control in multi-tag pyrosequencing of microbial communities. Biotechniques 53:381–383
Article CAS PubMed Google Scholar
Simon UK, Weiss M (2008) Intragenomic variation of fungal ribosomal genes is higher than previously thought. Mol Biol Evol 25:2251–2254
Article CAS PubMed Google Scholar
Smith ME, Douhan GW, Rizzo DM (2007) Intra-specific and intra-sporocarp ITS variation of ectomycorrhizal fungi as assessed by rDNA sequencing of sporocarps and pooled ectomycorrhizal roots from a Quercus woodland. Mycorrhiza 18:15–22
Article CAS PubMed Google Scholar
Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, Arrieta JM, Herndl GJ (2006) Microbial diversity in the deep sea and the underexplored ‘rare biosphere’. Proc Natl Acad Sci USA 103:12115–12120
Article PubMed Central CAS PubMed Google Scholar
Stamatakis A (2006) RAxML-VI-HPC: maximum-Likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690
Article CAS PubMed Google Scholar
Stamatakis A, Ludwig T, Meier H (2005) RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456–463
Article CAS PubMed Google Scholar
Stamatakis A, Hoover P, Rougemont J (2008) A fast bootstrapping algorithm for the RAxML web-servers. Syst Biol 57:758–771
Article PubMed Google Scholar
Suchard MA, Redelings BD (2006) BAli-Phy: simultaneous bayesian inference of alignment and phylogeny. Bioinformatics 22:2047–2048
Article CAS PubMed Google Scholar
Taylor DL, McCormick MK (2008) Internal transcribed spacer primers and sequences for improved characterization of basidiomycetous orchid mycorrhizas. New Phytol 177:1020–1033
Article CAS PubMed Google Scholar
Taylor DL, Booth MG, McFarland JW, Herriott IC, Lennon NJ, Nusbaum C, Marr TG (2008) Increasing ecological infrence from high throughput sequencing of fungi in the environment through a tagging approach. Mol Ecol Resour 8:742–752
Article CAS PubMed Google Scholar
Tedersoo L, Nilsson RH, Abarenkov K, Jairus T, Sadam A, Saar I, Bahram M, Bechem E, Chuyong G, Kõljalg U (2010) 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. New Phytol 188:291–301
Article CAS PubMed Google Scholar
Unterseher M, Jumpponen A, Öpik M, Tedersoo L, Moora M, Dormann C, Schnittler M (2011) Species abundance distributions and richness estimations in fungal metagenomics—lessons learned from community ecology. Mol Ecol 20:275–285
Article PubMed Google Scholar
Vandenbroucke I, Van Marck H, Verhasselt P, Thys K, Mostmans W, Dumont S, Van Eygen V, Coen K, Tuefferd M, Aerssens J (2011) Minor variant detection in amplicons using 454 massive parallel pyrosequencing: experiences and considerations for successful applications. Biotechniques 51:167–177
Article CAS PubMed Google Scholar
Varón A, Vinh LS, Wheeler WC (2010) POY version 4: phylogenetic analysis using dynamic homologies. Cladistics 26:72–85
Article Google Scholar
Wallander H, Johansson U, Sterkenburg E, Durling MB, Lindahl B (2010) Production of ectomycorrhizal mycelium peaks during canopy closure in Norway spruce. New Phytol 187:1124–1134
Article CAS PubMed Google Scholar
Wang Z, Nilsson RH, Lopez-Giraldez F, Zhuang WY, Dai YC et al (2011) Tasting soil fungal diversity with earth tongues: phylogenetic test of SATe′ alignments for environmental ITS data. PLoS ONE 6(4):e19039
Article PubMed Central CAS PubMed Google Scholar
Won H, Renner SS (2005) The internal transcribed spacer of nuclear ribosomal DNA in the gymnosperm Gnetum. Mol Phylogen Evol 36:581–597
Article CAS Google Scholar
Zhou J, Wu L, Deng Y, Zhi X, Jiang YH, Tu Q, Xie J, Van Nostrand JD, He Z, Yang Y (2011) Reproducibility and quantitation of amplicon sequencing-based detection. ISME J 1:11
CAS Google Scholar

Download references

Acknowledgments

Financial support for this study was provided by grant DEB 0841405 from the National Science Foundation to George Mason University: “Phylogenetic Diversity of Mycobionts and Photobionts in the Cyanolichen Genus Dictyonema, with Empasis on the Neotropics and the Galapagos Islands” (PI J. Lawrey; Co-PIs: R. Lücking, P. Gillevet). The biological material was collected as part of grant DEB-0715660 to The Field Museum: “Neotropical Epiphytic Microlichens—An Innovative Inventory of a Highly Diverse yet Little Known Group of Symbiotic Organisms” (PI R. Lücking). Naga Betrapally assisted with the SRA and BioProject submission of the dataset.

Author information

Authors and Affiliations

Science & Education, Integrative Research Center, The Field Museum, 1400 South Lake Shore Drive, Chicago, IL, 60605-2496, USA
Robert Lücking
Department of Environmental Science and Policy, George Mason University, Fairfax, VA, 22030-4444, USA
James D. Lawrey, Patrick M. Gillevet, Masoumeh Sikaroodi & Manuela Dal-Forno
The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118, Heidelberg, Germany
Simon A. Berger

Authors

Robert Lücking
View author publications
You can also search for this author in PubMed Google Scholar
James D. Lawrey
View author publications
You can also search for this author in PubMed Google Scholar
Patrick M. Gillevet
View author publications
You can also search for this author in PubMed Google Scholar
Masoumeh Sikaroodi
View author publications
You can also search for this author in PubMed Google Scholar
Manuela Dal-Forno
View author publications
You can also search for this author in PubMed Google Scholar
Simon A. Berger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Lücking.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (TXT 4 kb)

Supplementary material 2 (TXT 4 kb)

Supplementary material 3 (TXT 10672 kb)

Supplementary material 4 (XLS 499 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lücking, R., Lawrey, J.D., Gillevet, P.M. et al. Multiple ITS Haplotypes in the Genome of the Lichenized Basidiomycete Cora inversa (Hygrophoraceae): Fact or Artifact?. J Mol Evol 78, 148–162 (2014). https://doi.org/10.1007/s00239-013-9603-y

Download citation

Received: 05 September 2013
Accepted: 20 November 2013
Published: 17 December 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00239-013-9603-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multiple ITS Haplotypes in the Genome of the Lichenized Basidiomycete Cora inversa (Hygrophoraceae): Fact or Artifact?

Abstract

Similar content being viewed by others

Unraveling intragenomic polymorphisms in the high-quality genome of Hypoxylaceae: a comprehensive study of the rDNA cistron

Intragenomic polymorphisms in the ITS region of high-quality genomes of the Hypoxylaceae (Xylariales, Ascomycota)

Using target enrichment sequencing to study the higher-level phylogeny of the largest lichen-forming fungi family: Parmeliaceae (Ascomycota)

Introduction

Materials and Methods

Results

Discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (TXT 4 kb)

Supplementary material 2 (TXT 4 kb)

Supplementary material 3 (TXT 10672 kb)

Supplementary material 4 (XLS 499 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple ITS Haplotypes in the Genome of the Lichenized Basidiomycete Cora inversa (Hygrophoraceae): Fact or Artifact?

Abstract

Similar content being viewed by others

Unraveling intragenomic polymorphisms in the high-quality genome of Hypoxylaceae: a comprehensive study of the rDNA cistron

Intragenomic polymorphisms in the ITS region of high-quality genomes of the Hypoxylaceae (Xylariales, Ascomycota)

Using target enrichment sequencing to study the higher-level phylogeny of the largest lichen-forming fungi family: Parmeliaceae (Ascomycota)

Introduction

Materials and Methods

Results

Discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (TXT 4 kb)

Supplementary material 2 (TXT 4 kb)

Supplementary material 3 (TXT 10672 kb)

Supplementary material 4 (XLS 499 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation