Abstract
Metagenomics is a research discipline of microbial communities that studies directly on genetic materials obtained from environmental samples without isolating and culturing single organisms in laboratory. One of the crucial tasks in metagenomic projects is the identification and taxonomic characterization of DNA sequences in the samples. In this paper, we present an unsupervised binning of metagenomic reads, called MetaAB, which can be able to identify and classify reads into groups of genomes using the information of genome abundances. The method is based on a proposed reduced-dimension model that is theoretically proved to have less computational time. Besides, MetaAB detects the number of genome abundances in data automatically by using the Bayesian Information Criterion. Experimental results show that the proposed method achieves higher accuracy and run faster than a recent abundance-based binning approach. The software implementing the algorithm can be downloaded at http://it.hcmute.edu.vn/bioinfo/metaab/index.htm
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Handelsman, J.: The New Science of Metagenomics: Revealing the Secrets of Out Microbial Planet. The National Academies Press, Washington, DC (2007)
Aann, R.I., Ludwig, W., Schleifer, K.H.: Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev. (1995)
Wooley, J.C.: A primer on metagenomics. PloS Computational Biology (2010)
Shendure, J., Ji, H.: Next-generation dna sequencing. Nature Biotechnology (2008)
Qin, J., Li, R., Wang, J.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464 (2010)
Huson, D.H.: Megan analysis of metagenomic data. Genome Research (2007)
Gerlach, W.: Taxonomic classification of metagenomic shotgun sequences with carma3. Nucleic Acids Research (2011)
Diaz, N.N., Krause, L., Goesmann, A., Niehaus, K., Nattkemper, T.W.: Tacoa: Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics (2009)
Yi, W., et al.: Metacluster-ta: taxonomic annotation for metagenomic databased on assembly-assisted binning. BMC Genomics 15 (2014)
Eisen, J.A.: Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes. PLoS Biol. 5(3) (2007)
Yang, B., Peng, Y., Qin, J., Chin, F.Y.L.: MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation. In: ACM BCB (2010)
Leung, H.C., Yiu, F.M., Yang, B., Peng, Y., Wang, Y., Liu, Z., Chin, F.Y.: A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11), 1489–1495 (2011)
Liao, R., Zhang, R., Guan, J., Zhou, S.: A new unsupervised binning approach for metagenomic sequences based on n-grams and automatic feature weighting. IEEE/ACM Transaction on Computational Biology and Bioinformatics (2014)
Nguyen, T.C., Zhu, D.: Markovbin: An algorithm to cluster metagenomic reads using a mixture modeling of hierarchical distributions. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. Journal of Computational Biology 18(3), 523–534 (2011)
Tanaseichuk, O., Borneman, J., Jiang, T.: A probabilistic approach to accurate abundance-based binning of metagenomic reads. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 404–416. Springer, Heidelberg (2012)
Lander, E.S., Waterman, M.S.: Genomic mapping by fingerprinting random clones: a mathematic alanalysis. Genomic (1988)
Li, X., Waterman, M.S.: Estimating the repeat structure and length of dna sequences using -tuples. Genome research 13(8), 1916–1922 (2003)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society.SeriesB (Methodological) 39(1), 1–38 (1977)
Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Transactions on Pattern analysis and machine intelligence 24(3), 381–396 (2004)
Hirose, K., Kawano, S., Konishi, S., Ichikawa, M.: Bayesian information criterion and selection of the number of factors in factor analysis models. Journal of Data Science 9(2), 243–259 (2011)
Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: Metacluster 4.0: a novel binning algorithm for ngs reads and huge number of species. Journal of Computational Biology 19(2), 241–249 (2012)
Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: Metacluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28(18), 356–362 (2012)
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: Metasim - a sequencing simulator for genomics and metagenomics. PLoS ONE (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Le, VV., Van Lang, T., Van Hoai, T. (2015). MetaAB - A Novel Abundance-Based Binning Approach for Metagenomic Sequences. In: Vinh, P., Vassev, E., Hinchey, M. (eds) Nature of Computation and Communication. ICTCC 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 144. Springer, Cham. https://doi.org/10.1007/978-3-319-15392-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-15392-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15391-9
Online ISBN: 978-3-319-15392-6
eBook Packages: Computer ScienceComputer Science (R0)