Abstract
Bacteria and archaea, collectively known as prokaryotes, have in general genomes that are much smaller than those of eukaryotes. As a result, thousands of these genomes have been sequenced. In prokaryotes, gene architecture lacks the intron-exon structure of eukaryotic genes (with an occasional exception). These two facts mean that there is an abundance of data for prokaryotic genomes, and that they are easier to study than the more complex eukaryotic genomes. In this chapter, we provide an overview of genome comparison tools that have been developed primarily (sometimes exclusively) for prokaryotic genomes. We cover methods that use only the DNA sequences, methods that use only the gene content, and methods that use both data types.
Similar content being viewed by others
References
Hyatt D et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119
Delcher AL et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069
Tatusova T et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44(14):6614–6624
Markowitz VM et al (2012) IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res 40(Database issue):D115–D122
Overbeek R et al (2014) The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res 42(Database issue):D206–D214
Chen H, Boutros PC (2011) VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics 12:35
Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 8(4):e62510
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313
Perriere G, Thioulouse J (1996) On-line tools for sequence retrieval and multivariate statistics in molecular biology. Comput Appl Biosci 12(1):63–69
Tettelin H et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 102(39):13950–13955
Vernikos G et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154
Marschall T (2016) Computational pan-genomics: status, promises and challenges. Brief Bioinform bbw089
Kaas RS et al (2012) Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia Coli genomes. BMC Genomics 13:577
Rouli L et al (2014) Genomic analysis of three African strains of bacillus anthracis demonstrates that they are part of the clonal expansion of an exclusively pathogenic bacterium. New Microbes New Infect 2(6):161–169
Contreras-Moreira B, Vinuesa P (2013) GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 79(24):7696–7701
Snipen L, Almoy T, Ussery DW (2009) Microbial comparative pan-genomics using binomial mixture models. BMC Genomics 10:385
Page AJ et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31(22):3691–3693
Galperin MY et al (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43(Database issue):D261–D269
Ashburner M et al (2000) Gene ontology: tool for the unification of biology the gene ontology consortium. Nat Genet 25(1):25–29
Conesa A et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676
Setubal JC, Meidanis J (1997) Introduction to computational molecular biology. PWS, Boston, MA
Kurtz S et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5(2):R12
Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, New York
Uricaru R et al (2015) YOC, a new strategy for pairwise alignment of collinear genomes. BMC Bioinformatics 16:111
Darling AC et al (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14(7):1394–1403
Wattam AR et al (2014) PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42(Database issue):D581–D591
Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33(7):1870–1874
Deloger M, El Karoui M, Petit MA (2009) A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J Bacteriol 191(1):91–99
Henz SR et al (2005) Whole-genome prokaryotic phylogeny. Bioinformatics 21(10):2329–2335
Meier-Kolthoff JP et al (2013) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14:60
Wulff NA et al (2014) The complete genome sequence of ‘Candidatus Liberibacter americanus’, associated with citrus huanglongbing. Mol Plant Microbe Interact 27(2):163–176
Akinosho H et al (2014) The emergence of clostridium thermocellum as a high utility candidate for consolidated bioprocessing applications. Front Chem 2:66
Setubal JC et al (2009) Genome sequence of Azotobacter vinelandii, an obligate aerobe specialized to support diverse anaerobic metabolic processes. J Bacteriol 191(14):4534–4545
Eisen JA et al (2000) Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 1(6):RESEARCH0011
Acknowledgments
This work was supported in part by a CNPq researcher fellowship (J.C.S. and N.F.A.); by CAPES grant 3385/2013 (BIGA project) (J.C.S. and N.F.A.); by Fundect-MS grants TO141/2016 and TO007/2015 (N.F.A); and by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and HumanServices, under contract no. HHSN272201400027C (A.R.W.).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Setubal, J.C., Almeida, N.F., Wattam, A.R. (2018). Comparative Genomics for Prokaryotes. In: Setubal, J., Stoye, J., Stadler, P. (eds) Comparative Genomics. Methods in Molecular Biology, vol 1704. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7463-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7463-4_3
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7461-0
Online ISBN: 978-1-4939-7463-4
eBook Packages: Springer Protocols