Defining Orthologs and Pangenome Size Metrics

Bosi, Emanuele; Fani, Renato; Fondi, Marco

doi:10.1007/978-1-4939-1720-4_13

Emanuele Bosi⁵,
Renato Fani⁵ &
Marco Fondi⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1231))

4012 Accesses
5 Citations
4 Altmetric

Abstract

Since the advent of ultra-massive sequencing techniques, the consequent drop-off in both price and time required made feasible the sequencing of increasingly more genomes from microbes belonging to the same taxonomic unit. Eventually, this led to the concept of pangenome, that is, the entire set of genes present in a group of representatives of the same genus/species, which, in turn, can be divided into core genome, defined as the set of those genes present in all the genomes under study, and a dispensable genome, the set of genes possessed only by one or a subset of organism.

When analyzing a pangenome, an interesting point is to measure its size, thus estimating the gene repertoire of a given taxonomic group. This is usually performed counting the novel genes added to the overall pangenome when new genomes are sequenced and annotated. A pangenome can be also classified as open or close: in an open pangenome its size increases indefinitely when adding new genomes; thus sequencing additional strains will likely yield novel genes. Conversely, in a close pangenome, adding new genomes will not lead to the discovery of new coding capabilities.

A central point in pangenomics is the definition of homology relationships between genes belonging to different genomes. This may turn into the search of those genes with similar sequences between different organisms (and including both paralogous and orthologous genes).

In this chapter, methods for finding groups of orthologs between genomes and for estimating the pangenome size are discussed. Also, working codes to address these tasks are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Panakeia - a universal tool for bacterial pangenome analysis

Article Open access 05 April 2022

Producing polished prokaryotic pangenomes with the Panaroo pipeline

Article Open access 22 July 2020

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies

Article Open access 30 June 2016

References

Read TD, Salzberg SL, Pop M, Shumway M, Umayam L, Jiang L, Holtzapple E, Busch JD, Smith KL, Schupp JM, Solomon D, Keim P, Fraser CM. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science 296:2028–2033
Google Scholar
Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, DeBoy RT, Davidsen TM, Mora M, Scarselli M, Ros IM, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O’Connor KJB, Smith S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 10:13950–13955
Article Google Scholar
Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han C, Ohtsubo E, Nakayama K, Murata T, Tanaka M, Tobe T, Iida T, Takami H, Honda T, Sasakawa C, Ogasawara N, Yasunaga T, Kuhara S, Shiba T, Hattori M, Shinagawa H (2001) Complete genome sequence of enterohemorrhagic Escherichia coli O157: H7 and genomic comparison with a laboratory strain K-12. DNA Res 8:11–22
Article CAS PubMed Google Scholar
Kuroda M, Ohta T, Uchiyama I, Baba T, Yuzawa H, Kobayashi I, Cui L, Oguchi A, Aoki K, Nagai Y, Lian J, Ito T, Kanamori M, Matsumaru H, Maruyama A, Murakami H, Hosoyama A, Mizutani-Ui Y, Takahashi NK, Sawano T, Inoue R, Kaito C, Sekimizu K, Hirakawa H, Kuhara S, Goto S, Yabuzaki J, Kanehisa M, Yamashita A, Oshima K, Furuya K, Yoshino C, Shiba T, Hattori M, Ogasawara N, Hayashi H, Hiramatsu K (2001) Whole genome sequencing of meticillin-resistant Staphylococcus aureus. Lancet 357:1225–1240
Article CAS PubMed Google Scholar
Pallen MJ, Wren BW (2007) Bacterial pathogenomics. Nature 449:835–842
Article CAS PubMed Google Scholar
Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R (2005) The microbial pan-genome. Curr Opin Genet Dev 15:589–594
Article CAS PubMed Google Scholar
Tettelin H, Riley D, Cattuto C, Medini D (2008) Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol 11:472–477
Article CAS PubMed Google Scholar
Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics 1. Annu Rev Genet 39:309–338
Article CAS PubMed Google Scholar
Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637
Article CAS PubMed Google Scholar
Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119
Article PubMed PubMed Central Google Scholar
Alexeyenko A, Tamas I, Liu G, Sonnhammer EL (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9–e15
Article CAS PubMed Google Scholar
Lukashin AV, Borodovsky M (1998) GeneMark. hmm: new solutions for gene finding. Nucleic Acids Res 26:1107–1115
Article CAS PubMed PubMed Central Google Scholar
Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27:4636–4641
Article CAS PubMed PubMed Central Google Scholar
Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189
Article CAS PubMed PubMed Central Google Scholar
O’Brien KP, Remm M, Sonnhammer EL (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 33:D476–D480
Article PubMed PubMed Central Google Scholar
van Dongen SM (2000) Graph clustering by flow simulation
Google Scholar
Galardini M, Mengoni A, Biondi EG, Semeraro R, Florio A, Bazzicalupo M, Benedetti A, Mocali S (2013) DuctApe: a suite for the analysis and correlation of genomes and Omnilog™ Phenotype Microarray data. Genomics
Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biology, University of Florence, via Madonna del Piano 6, Sesto Fiorentino, Florence, 50019, Italy
Emanuele Bosi, Renato Fani & Marco Fondi

Authors

Emanuele Bosi
View author publications
You can also search for this author in PubMed Google Scholar
Renato Fani
View author publications
You can also search for this author in PubMed Google Scholar
Marco Fondi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emanuele Bosi .

Editor information

Editors and Affiliations

Department of Biology, University of Florence, Florence, Italy
Alessio Mengoni
EMBL-EBI, Cambridge, United Kingdom
Marco Galardini
Department of Biology, University of Florence, Florence, Italy
Marco Fondi

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bosi, E., Fani, R., Fondi, M. (2015). Defining Orthologs and Pangenome Size Metrics. In: Mengoni, A., Galardini, M., Fondi, M. (eds) Bacterial Pangenomics. Methods in Molecular Biology, vol 1231. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-1720-4_13

Download citation

DOI: https://doi.org/10.1007/978-1-4939-1720-4_13
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-1719-8
Online ISBN: 978-1-4939-1720-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Defining Orthologs and Pangenome Size Metrics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Panakeia - a universal tool for bacterial pangenome analysis

Producing polished prokaryotic pangenomes with the Panaroo pipeline

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Defining Orthologs and Pangenome Size Metrics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Panakeia - a universal tool for bacterial pangenome analysis

Producing polished prokaryotic pangenomes with the Panaroo pipeline

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation