Abstract
The internal transcribed spacer (ITS) is the locus of choice with which to characterize fungal diversity in environmental samples. However, methods to analyze ITS datasets have lagged behind the capacity to generate large amounts of sequence information. Here, we describe our bioinformatics pipeline to process large fungal ITS sequence datasets, from raw chromatograms to a spreadsheet of operational taxonomic unit (OTU) abundances across samples. Steps include assembling of reads originating from one clone, identifying primer “barcodes” or “tags,” trimming vectors and primers, marking low-quality base calls and removing low-quality sequences, orienting sequences, extracting the ITS region from longer amplicons, and grouping sequences into OTUs. We expect that the principles and tools presented here are relevant to datasets arising from ever-evolving new technologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fox, G. E., Stackebrandt, E., Hespell, R. B., Gibson, J., Maniloff, J., Dyer, T. A., Wolfe, R. S., Balch, W. E., Tanner, R. S., Magrum, L. J., Zablen, L. B., Blakemore, R., Gupta, R., Bonen, L., Lewis, B. J., Stahl, D. A., Luehrsen, K. R., Chen, K. N., and Woese, C. R. (1980) The phylogeny of prokaryotes, Science 209, 457–463.
Pace, N. R., Stahl, D. A., Lane, D. J., and Olsen G. J. (1985) Analyzing natural microbial populations by rRNA sequences, ASM American Society for Microbiology News 51, 4–12.
Giovannoni, S. J., Britschgi, T. B., Moyer, C. L., and Field, K. G. (1990) Genetic diversity in Sargasso Sea bacterioplankton, Nature 345, 60–63.
Vandenkoornhuyse, P., Baldauf, S. L., Leyval, C., Straczek, J., and Young, J. P. W. (2002) Evolution – extensive fungal diversity in plant roots, Science 295, 2051–2051.
Schadt, C. W., Martin, A. P., Lipson, D. A., and Schmidt, S. K. (2003) Seasonal dynamics of previously unknown fungal lineages in tundra soils, Science 301, 1359–1361.
O´Brien, H. E., Parrent, J. L., Jackson, J. A., Moncalvo, J. M., and Vilgalys, R. (2005) Fungal community analysis by large-scale sequencing of environmental samples, Appl Environ Microb 71, 5544–5550.
Maidak, B. L., Cole, J. R., Lilburn, T. G., Parker, C. T., Saxman, P. R., Farris, R. J., Garrity, G. M., Olsen, G. J., Schmidt, T. M., and Tiedje, J. M. (2001) The RDP-II (Ribosomal Database Project), Nucleic Acids Res 29, 173–174.
DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P., and Andersen, G. L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB, Appl Environ Microb 72, 5069–5072.
Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., Lesniewski, R. A., Oakley, B. B., Parks, D. H., Robinson, C. J., Sahl, J. W., Stres, B., Thallinger, G. G., Van Horn, D. J., and Weber, C. F. (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities, Appl Environ Microb 75, 7537–7541.
Gardes, M., and Bruns, T. D. (1993) ITS primers with enhanced specificity for basidiomycetes – application to the identification of mycorrhizae and rusts, Mol Ecol 2, 113–118.
Seifert, K. A. (2009) Progress towards DNA barcoding of fungi, Mol Ecol Resour 9, 83–89.
Kunin, V., Engelbrektson, A., Ochman, H., and Hugenholtz, P. (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates, Environ Microbiol 12, 118–23.
Meyerhans, A., Vartanian, J. P., and Wainhobson, S. (1990) DNA recombination during Pcr, Nucleic Acids Res 18, 1687–1691.
Ashelford, K. E., Chuzhanova, N. A., Fry, J. C., Jones, A. J., and Weightman, A. J. (2006) New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras, Appl Environ Microb 72, 5734–5741.
Valentini, A., Miquel, C., Nawaz, M. A., Bellemain, E., Coissac, E., Pompanon, F., Gielly, L., Cruaud, C., Nascetti, G., Wincker, P., Swenson, J. E., and Taberlet, P. (2009) New perspectives in diet analysis based on DNA barcoding and parallel pyrosequencing: the trnL approach, Mol Ecol Resour 9, 51–60.
Sogin, M. L., Morrison, H. G., Huber, J. A., Mark Welch, D., Huse, S. M., Neal, P. R., Arrieta, J. M., and Herndl, G. J. (2006) Microbial diversity in the deep sea and the underexplored “rare biosphere”, P Natl Acad Sci USA 103, 12115–12120.
Buee, M., Reich, M., Murat, C., Morin, E., Nilsson, R. H., Uroz, S., and Martin, F. (2009) 454 Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity, New Phytol 184, 449–456.
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y. J., Chen, Z. T., Dewell, S. B., Du, L., Fierro, J. M., Gomes, X. V., Godwin, B. C., He, W., Helgesen, S., Ho, C. H., Irzyk, G. P., Jando, S. C., Alenquer, M. L. I., Jarvie, T. P., Jirage, K. B., Kim, J. B., Knight, J. R., Lanza, J. R., Leamon, J. H., Lefkowitz, S. M., Lei, M., Li, J., Lohman, K. L., Lu, H., Makhijani, V. B., McDade, K. E., McKenna, M. P., Myers, E. W., Nickerson, E., Nobile, J. R., Plant, R., Puc, B. P., Ronan, M. T., Roth, G. T., Sarkis, G. J., Simons, J. F., Simpson, J. W., Srinivasan, M., Tartaro, K. R., Tomasz, A., Vogt, K. A., Volkmer, G. A., Wang, S. H., Wang, Y., Weiner, M. P., Yu, P. G., Begley, R. F., and Rothberg, J. M. (2005) Genome sequencing in microfabricated high-density picolitre reactors, Nature 437, 376–380.
Taylor, D. L., Herriott, I. C., Long, J., and O’Neill, K. (2007) TOPO TA is A-OK: a test of phylogenetic bias in fungal environmental clone library construction, Environ Microbiol 9, 1329–1334.
Taylor, D. L., Booth, M. G., Mcfarland, J. W., Herriott, I. C., Lennon, N. J., Nusbaum, C., and Marr, T. G. (2008) Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach, Mol Ecol Resour 8, 742–752.
Geml, J., Laursen, G. A., and Taylor, D. L. (2008) Molecular diversity assessment of arctic and boreal Agaricus taxa, Mycologia 100, 577–589.
Geml, J., Laursen, G. A., Timling, I., Mcfarland, J. M., Booth, M. G., Lennon, N., Nusbaum, C., and Taylor, D. L. (2009) Molecular phylogenetic biodiversity assessment of arctic and boreal ectomycorrhizal Lactarius Pers. (Russulales; Basidiomycota) in Alaska, based on soil and sporocarp DNA, Mol Ecol 18, 2213–2227.
White, T. J., Bruns, T., Lee. S., Taylor, J. (1990) Amplification and direct sequencing of fungal ribosomal RNA Genes for phylogenetics, PCR protocols: a guide to methods and applications 42, 315–322.
Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment, Genome Res 8, 175–185.
Ewing, B., and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities, Genome Res 8, 186–194.
Brockman, W., Alvarez, P., Young, S., Garber, M., Giannoukos, G., Lee, W. L., Russ, C., Lander, E. S., Nusbaum, C., and Jaffe, D. B. (2008) Quality scores and SNP detection in sequencing-by-synthesis systems, Genome Res 18, 763–770.
Gordon, D., Abajian, C., and Green, P. (1998) Consed: A graphical tool for sequence finishing, Genome Res 8, 195–202.
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) Clustal-W – improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res 22, 4673–4680.
Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res 32, 1792–1797.
Hall, T. A. (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT, In: Nucleic acids symposium series. p. 95–98.
Pertea, G., Huang, X. Q., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, J., Cheung, F., Parvizi, B., Tsai, J., and Quackenbush, J. (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets, Bioinformatics 19, 651–652.
Huang, X. Q., and Madan, A. (1999) Cap3: A DNA sequence assembly program, Genome Res 9, 868–877.
Higgins, K. L., Arnold, A. E., Miadlikowska, J., Sarvate, S. D., and Lutzoni, F. (2007) Phylogenetic relationships, host affinity, and geographic structure of boreal and arctic endophytes from three major plant lineages, Mol Phylogenet Evol 42, 543–555.
Colwell, R. K., and Coddington, J. A. (1994) Estimating terrestrial biodiversity through extrapolation, Philos T Roy Soc B 345, 101–118.
McCune, B., Mefford, M. J. (1999) PC-ord. Multivariate analysis of ecological data, version 4(0).
Oksanen, J., Kindt, R., Legendre, P., O’Hara, B., Stevens, M. H. (2007) vegan: Community Ecology Package. R package version 1.8-8. Online at: http://r-forge.r-project.org/projects/vegan.
Lozupone, C., and Knight, R. (2005) UniFrac: a new phylogenetic method for comparing microbial communities, Appl Environ Microb 71, 8228–8235.
Webb, C. O., Ackerly, D. D., and Kembel, S. W. (2008) Phylocom: software for the analysis of phylogenetic community structure and trait evolution, Bioinformatics 24, 2098–2100.
Koljalg, U., Larsson, K. H., Abarenkov, K., Nilsson, R. H., Alexander, I. J., Eberhardt, U., Erland, S., Hoiland, K., Kjoller, R., Larsson, E., Pennanen, T., Sen, R., Taylor, A. F. S., Tedersoo, L., Vralstad, T., and Ursing, B. M. (2005) UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi, New Phytol 166, 1063–1068.
Nilsson, R., Bok, G., Ryberg, M., Kristiansson, E., Hallenberg, N. (2009) A software pipeline for processing and identification of fungal ITS sequences. Source Code Biol Med 4, 1.
Jumpponen, A. (2003) Soil fungal community assembly in a primary successional glacier forefront ecosystem as inferred from rDNA sequence analyses, New Phytol 158, 569–578.
Huber, T., Faulkner, G., and Hugenholtz, P. (2004) Bellerophon: a program to detect chimeric sequences in multiple sequence alignments, Bioinformatics 20, 2317–2319.
Perotto, S., Nepote-Fus, P., Saletta, L., Bandi, C., and Young, J. P. W. (2000) A diverse population of introns in the nuclear ribosomal genes off ericoid mycorrhizal fungi includes elements with sequence similarity to endonuclease-coding genes, Mol Biol Evol 17, 44–59.
Acknowledgments
We thank James Long for writing several of the original pipeline scripts and Dan Cardin for writing the tag-finder script. Niall Lennon and Chad Nusbaum of the Broad Institute, MA, spearheaded high-throughput Sanger sequencing of our fungal clone libraries. Lab members Michael Booth, Robert Burgess, Ian Herriott, Jack McFarland, and Ina Timling have assisted with testing and improving our pipeline and also provided valuable comments on earlier drafts of the chapter. This work was supported in part by the National Science Foundation under grant numbers EF-0333308 and ARC-0632332. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the National Science Foundation. This publication was also made possible by grant number 2P20RR016466 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Taylor, D.L., Houston, S. (2011). A Bioinformatics Pipeline for Sequence-Based Analyses of Fungal Biodiversity. In: Xu, JR., Bluhm, B. (eds) Fungal Genomics. Methods in Molecular Biology, vol 722. Humana Press. https://doi.org/10.1007/978-1-61779-040-9_10
Download citation
DOI: https://doi.org/10.1007/978-1-61779-040-9_10
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-039-3
Online ISBN: 978-1-61779-040-9
eBook Packages: Springer Protocols