Abstract
The cost of DNA sequencing has decreased due to advancements in Next Generation Sequencing. The number of sequences obtained from the Illumina platform is large, use of this platform can reduce costs more than the 454 pyrosequencer. However, the Illumina platform has other challenges, including bioinformatics analysis of large numbers of sequences and the need to reduce erroneous nucleotides generated at the 3′-ends of the sequences. These erroneous sequences can lead to errors in analysis of microbial communities. Therefore, correction of these erroneous sequences is necessary for accurate taxonomic identification. Several studies that have used the Illumina platform to perform metagenomic analyses proposed curating pipelines to increase accuracy. In this study, we evaluated the likelihood of obtaining an erroneous microbial composition using the MiSeq 250 bp paired sequence platform and improved the pipeline to reduce erroneous identifications. We compared different sequencing conditions by varying the percentage of control phiX added, the concentration of the sequencing library, and the 16S rRNA gene target region using a mock community sample composed of known sequences. Our recommended method corrected erroneous nucleotides and improved identification accuracy. Overall, 99.5% of the total reads shared 95% similarity with the corresponding template sequences and 93.6% of the total reads shared over 97% similarity. This indicated that the MiSeq platform can be used to analyze microbial communities at the genus level with high accuracy. The improved analysis method recommended in this study can be applied to amplicon studies in various environments using high-throughput reads generated on the MiSeq platform.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Ahn, J.H., Kim, M.S., Kim, M.C., Lim, J.S., Lee, G.T., Yun, J.K., Kim, T., Kim, T., and Ka, J.O. 2006. Analysis of bacterial diversity and community structure in forest soils contaminated with fuel hydrocarbon. J. Microbiol. Biotechnol. 16, 704–715.
Bartram, A.K., Lynch, M.D.J., Stearns, J.C., Moreno-Hagelsieb, G., and Neufeld, J.D. 2011. Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end Illumina reads. Appl. Environ. Microbiol. 77, 5569–5569.
Bell, T.H., Yergeau, E., Maynard, C., Juck, D., Whyte, L.G., and Greer, C.W. 2013. Predictable bacterial composition and hydrocarbon degradation in arctic soils following diesel and nutrient disturbance. ISME J. 7, 1200–1210.
Berry, D., Schwab, C., Milinovich, G., Reichert, J., Ben Mahfoudh, K., Decker, T., Engel, M., Hai, B., Hainzl, E., Heider, S., et al. 2012. Phylotype-level 16S rRNA analysis reveals new bacterial indicators of health state in acute murine colitis. ISME J. 6, 2091–2106.
Bokulich, N.A., Subramanian, S., Faith, J.J., Gevers, D., Gordon, J.I., Knight, R., Mills, D.A., and Caporaso, J.G. 2013. Qualityfiltering vastly improves diversity estimates from Illumina amplicon sequencing. Nat. Methods 10, 57–59.
Caporaso, J.G., Lauber, C.L., Walters, W.A., Berg-Lyons, D., Huntley, J., Fierer, N., Owens, S.M., Betley, J., Fraser, L., Bauer, M., et al. 2012. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624.
Caporaso, J.G., Lauber, C.L., Walters, W.A., Berg-Lyons, D., Lozupone, C.A., Turnbaugh, P.J., Fierer, N., and Knight, R. 2011. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl. Acad. Sci. USA 108, 4516–4522.
Claesson, M.J., Wang, Q., O’Sullivan, O., Greene-Diniz, R., Cole, J.R., Ross, R.P., and O’Toole, P.W. 2010. Comparison of two Next-Generation Sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Res. 38, e200.
Degnan, P.H. and Ochman, H. 2012. Illumina-based analysis of microbial community diversity. ISME J. 6, 183–194.
Dunnett, C.W. 1955. A multiple comparison procedure for comparing several treatments with a control. J. Amer. Statist. Ass. 50, 1096–1121.
Edgar, R.C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461.
Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Knight, R. 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27, 2194–2200.
Engelbrektson, A., Kunin, V., Wrighton, K.C., Zvenigorodsky, N., Chen, F., Ochman, H., and Hugenholtz, P. 2010. Experimental factors affecting PCR-based estimates of microbial species richness and evenness. ISME J. 4, 642–647.
Fisher, R.A. 1922. On the interpretation of χ2 from contingency tables, and the calculation of P. J. Royal Statist. Soc. 85, 87–94.
Gloor, G.B., Hummelen, R., Macklaim, J.M., Dickson, R.J., Fernandes, A.D., MacPhee, R., and Reid, G. 2010. Microbiome profiling by Illumina sequencing of combinatorial sequencetagged PCR products. PLoS One 5, e15406.
Huse, S.M., Dethlefsen, L., Huber, J.A., Welch, D.M., Relman, D.A., and Sogin, M.L. 2008. Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet. 4, e1000255.
Ishii, K. and Fukui, M. 2001. Optimization of annealing temperature to reduce bias caused by a primer mismatch in multitemplate PCR. Appl. Environ. Microbiol. 67, 3753–3755.
Janda, J.M. and Abbott, S.L. 2007. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: Pluses, Perils, and Pitfalls. J. Clin. Microbiol. 45, 2761–2764.
Jeon, Y.S., Chun, J., and Kim, B.S. 2013. Identification of household bacterial community and analysis of species shared with human microbiome.Curr. Microbiol. 67, 557–563.
Junemann, S., Prior, K., Szczepanowski, R., Harks, I., Ehmke, B., Goesmann, A., Stoye, J., and Harmsen, D. 2012. Bacterial community shift in treated periodontitis patients revealed by ion torrent 16S rRNA gene amplicon sequencing. PLoS One 7, e41606.
Kim, M.C., Ahn, J.H., Shin, H.C., Kim, T., Ryu, T.H., Kim, D.H., Song, H.G., Lee, G.H., and Kai, J.O. 2008. Molecular analysis of bacterial community structures in paddy soils for environmental risk assessment with two varieties of genetically modified rice, Iksan 483 and Milyang 204. J. Microbiol. Biotechnol. 18, 207–218.
Kim, O.S., Cho, Y.J., Lee, K., Yoon, S.H., Kim, M., Na, H., Park, S.C., Jeon, Y.S., Lee, J.H., Yi, H., et al. 2012. Introducing EzTaxon-e: A prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int. J. Syst. Evol. Microbiol. 62, 716–721.
Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K., and Schloss, P.D. 2013. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl. Environ. Microbiol. 79, 5112–5120.
Kumar, P.S., Brooker, M.R., Dowd, S.E., and Camerlengo, T. 2011. Target region selection is a critical determinant of community fingerprints generated by 16S pyrosequencing. PLoS One 6, e20956.
Kurata, S., Kanagawa, T., Magariyama, Y., Takatsu, K., Yamada, K., Yokomaku, T., and Kamagata, Y. 2004. Reevaluation and reduction of a PCR bias caused by reannealing of templates. Appl. Environ. Microbiol. 70, 7545–7549.
LaTuga, M.S., Ellis, J.C., Cotton, C.M., Goldberg, R.N., Wynn, J.L., Jackson, R.B., and Seed, P.C. 2011. Beyond bacteria: A study of the enteric microbial consortium in extremely low birth weight infants. PLoS One 6, e27858.
Li, H. and Durbin, R. 2009. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25, 1754–1760.
Liu, Z.Z., DeSantis, T.Z., Andersen, G.L., and Knight, R. 2008. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers. Nucleic Acids Res. 36, e120.
Miller, W. and Myers, E.W. 1988. Sequence comparison with concave weighting functions. Bull. Math. Biol. 50, 97–120.
Nakamura, K., Oshima, T., Morimoto, T., Ikeda, S., Yoshikawa, H., Shiwa, Y., Ishikawa, S., Linak, M.C., Hirai, A., Takahashi, H., et al. 2011. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39, e90.
Nelson, M.C., Morrison, H.G., Benjamino, J., Grim, S.L., and Graf, J. 2014. Analysis, optimization and verification of Illumina-generated 16S rRNA gene amplicon surveys. PLoS One 9, e94249.
Oh, J., Kim, B.K., Cho, W.S., Hong, S.G., and Kim, K.M. 2012. Pyrotrimmer: A software with GUI for pre-processing 454 amplicon sequences. J. Microbiol. 50, 766–769.
Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. 2011. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26.
Schloss, P.D., Gevers, D., and Westcott, S.L. 2011. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6, e27310.
Suzuki, M.T. and Giovannoni, S.J. 1996. Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR. Appl. Environ. Microbiol. 62, 625–630.
Tindall, B.J., Rossello-Mora, R., Busse, H.J., Ludwig, W., and Kampfer, P. 2010. Notes on the characterization of prokaryote strains for taxonomic purposes. Int. J. Syst. Evol. Microbiol. 60, 249–266.
Wagner, A., Blackstone, N., Cartwright, P., Dick, M., Misof, B., Snow, P., Wagner, G.P., Bartels, J., Murtha, M., and Pendleton, J. 1994. Surveys of gene families using polymerase chain-reaction- PCR selection and PCR drift. Syst. Biol. 43, 250–261.
Wang, Q., Garrity, G.M., Tiedje, J.M., and Cole, J.R. 2007. Naive bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267.
Werner, J.J., Zhou, D., Caporaso, J.G., Knight, R., and Angenent, L.T. 2012. Comparison of Illumina paired-end and single-direction sequencing for microbial 16S rRNA gene amplicon surveys. ISME J. 6, 1273–1276.
Woese, C.R. 1987. Bacterial evolution. Microbiol. Rev. 51, 221–271.
Yarza, P., Yilmaz, P., Pruesse, E., Glockner, F.O., Ludwig, W., Schleifer, K.H., Whitman, W.B., Euzeby, J., Amann, R., and Rossello-Mora, R. 2014. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol. 12, 635–645.
Zhou, H.W., Li, D.F., Tam, N.F.Y., Jiang, X.T., Zhang, H., Sheng, H.F., Qin, J., Liu, X., and Zou, F. 2011. Bipes, a cost-effective highthroughput method for assessing microbial diversity. ISME J. 5, 741–749.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supplemental material for this article may be found at http://www.springerlink.com/content/120956.
Rights and permissions
About this article
Cite this article
Jeon, YS., Park, SC., Lim, J. et al. Improved pipeline for reducing erroneous identification by 16S rRNA sequences using the Illumina MiSeq platform. J Microbiol. 53, 60–69 (2015). https://doi.org/10.1007/s12275-015-4601-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12275-015-4601-y