Key Points
-
The insight gained through comparative genomics is markedly increasing as novel sequencing technologies are developed and become more affordable. Different technologies, data types and quality can be used to ask biological questions within or across species.
-
Genome annotation, incorporating coding genes, non-coding RNAs, transcription factor-binding sites, chromatin marks and more, are key to understanding genome evolution and disease. Many of these features are shared across species.
-
Genotype–phenotype correlations can be used to understand the genetic basis of numerous traits and diseases. These can be studied both on the individual, population or species level.
-
Tools and resources for genome annotation and trait mapping have made it possible to identify disease genes in many domestic animals. This information can also be used to guide the search for human disease-associated genes.
-
Low-pass population data has been successfully implemented to find natural genetic adaptations to salinity, high altitude and diet. Mutation types vary from coding to non-coding and from point mutations to structural variants.
-
Analysis of large numbers of mammals, birds or fish will enable the detection of constraint (a sign of function) as well as positive selection across clades. Convergent evolution, in which different mutations in the same genetic loci are responsible for specific phenotypic adaptation, is apparent across many lineages, including the red and giant pandas.
Abstract
With the generation of more than 100 sequenced vertebrate genomes in less than 25 years, the key question arises of how these resources can be used to inform new or ongoing projects. In the past, this diverse collection of sequences from human as well as model and non-model organisms has been used to annotate the human genome and to increase the understanding of human disease. In the future, comparative vertebrate genomics in conjunction with additional genomic resources will yield insights into the processes of genome function, evolution, speciation, selection and adaptation, as well as the quantification of species diversity. In this Review, we discuss how the genomics of non-human organisms can provide insights into vertebrate biology and how this can contribute to the understanding of human physiology and health.
Similar content being viewed by others
References
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Istrail, S. et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl Acad. Sci. USA 101, 1916–1921 (2004).
Gibbs, R. A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004).
Wade, C. M. et al. The mosaic structure of variation in the laboratory mouse genome. Nature 420, 574–578 (2002).
Lindblad-Toh, K. et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819 (2005). This study describes the canine genome project, which addressed both comparative genome analysis and trait mapping in dogs.
Wade, C. M. et al. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science 326, 865–867 (2009).
Bovine Genome Sequencing and Analysis Consortium et al. The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 324, 522–528 (2009).
Rhesus Macaque Genome Sequencing and Analysis Consortium et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234 (2007).
Mikkelsen, T. S. et al. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447, 167–177 (2007).
Melé, M. et al. Chromatin environment, transcriptional regulation, and splicing distinguish lincRNAs and mRNAs. Genome Res. 27, 27–37 (2017).
Valouev, A. et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res. 18, 1051–1063 (2008).
Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).
Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011).
Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017).
Lu, H., Giordano, F. & Ning, Z. Oxford nanopore minION sequencing and genome assembly. Genomics Proteomics Bioinformatics 14, 265–279 (2016).
Cao, H. et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience 3, 34 (2014).
Howe, K. & Wood, J. M. D. Using optical mapping data for the improvement of vertebrate genome assemblies. Gigascience 4, 10 (2015).
Ganapathy, G. et al. High-coverage sequencing and annotated assemblies of the budgerigar genome. Gigascience 3, 11 (2014).
Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017). This paper presents an example of a hybrid reference genome, with particular attention paid to gains of continuity through a combination of sequencing methods.
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016). This is an in-depth review of different sequencing technologies and their pros and cons.
Koepfli, K.-P., Paten, B. & O'Brien, S. J. The Genome 10K Project: a way forward. Annu. Rev. Anim. Biosci. 3, 57–111 (2014).
Lamichhaney, S. et al. Evolution of Darwin's finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015).
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011). This study presents the comparative analysis of 29 mammals to annotate the human genome.
Hu, Y. et al. Comparative genomics reveals convergent evolution between the bamboo-eating giant and red pandas. Proc. Natl Acad. Sci. USA 114, 1081–1086 (2017). This is an elegant paper that describes convergent evolution in two distantly related pandas.
Reichwald, K. et al. High tandem repeat content in the genome of the short-lived annual fish Nothobranchius furzeri: a new vertebrate model for aging research. Genome Biol. 10, R16 (2009).
Carneiro, M. et al. Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication. Science 345, 1074–1079 (2014).
Rubin, C.-J. et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587–591 (2010).
Alföldi, J. et al. The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature 477, 587–591 (2011).
Newman, C. E., Gregory, T. R. & Austin, C. C. The dynamic evolutionary history of genome size in North American woodland salamanders. Genome 60, 285–292 (2017).
Huang, H. W., NISC Comparative Sequencing Program, Mullikin, J. C. & Hansen, N. F. Evaluation of variant detection software for pooled next-generation sequence data. BMC Bioinformatics 16, 235 (2015). This article presents an overview of variant detection methods that are used for sweep analysis.
Oleksyk, T. K., Smith, M. W. & O'Brien, S. J. Genome-wide scans for footprints of natural selection. Phil. Trans. R. Soc. B Biol. Sci. 365, 185–205 (2010).
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358 (1984).
Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595 (1989).
Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
Boyko, A. R. et al. Complex population structure in African village dogs and its implications for inferring dog domestication history. Proc. Natl Acad. Sci. USA 106, 13903–13908 (2009).
Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat. Genet. 44, 631–635 (2012).
Friedenberg, S. G. & Meurs, K. M. Genotype imputation in the domestic dog. Mamm. Genome 27, 485–494 (2016).
Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Sargolzaei, M., Chesnais, J. P. & Schenkel, F. S. A new approach for efficient genotype imputation using information from relatives. BMC Genomics 15, 478 (2014).
Lamichhaney, S. et al. Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax). Nat. Genet. 48, 84–88 (2016).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015). This study describes the single-molecule sequencing of a human genome, which enables the deciphering of both haplotypes and complex genomic regions.
Gordon, D. et al. Long-read sequence assembly of the gorilla genome. Science 352, aae0344 (2016).
Hoeppner, M. P. et al. An improved canine genome and a comprehensive catalogue of coding genes and non-coding transcripts. PLoS ONE 9, e91172 (2013).
Ramsköld, D., Kavak, E. & Sandberg, R. How to analyze gene expression using RNA-sequencing data. Methods Mol. Biol. 802, 259–274 (2012).
Sandberg, R. Entering the era of single-cell transcriptomics in biology and medicine. Nat. Methods 11, 22–24 (2014).
Ricaño-Ponce, I. & Wijmenga, C. Mapping of immune-mediated disease genes. Annu. Rev. Genom. Hum. Genet. 14, 325–353 (2013).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Vietri Rudan, M. et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 10, 1297–1309 (2015).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). This article describes the ENCODE project, in which functional elements are assigned to the human genome.
Andersson, L. et al. Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol. 16, 57 (2015).
Tuggle, C. K. et al. GO-FAANG meeting: a Gathering On Functional Annotation of Animal Genomes. Anim. Genet. 47, 528–533 (2016).
Lonsdorf, E. V. et al. Socioecological correlates of clinical signs in two communities of wild chimpanzees (Pan troglodytes) at Gombe National Park, Tanzania. Am. J. Primatol. http://dx.doi.org/10.1002/ajp.22562 (2016).
Jones, F. C. et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61 (2012).
Cockett, N. E. et al. Polar overdominance at the ovine callipyge locus. Science 273, 236–238 (1996).
Hutchings, M. R., Knowler, K. J., McAnulty, R. & McEwan, J. C. Genetically resistant sheep avoid parasites to a greater extent than do susceptible sheep. Proc. Biol. Sci. 274, 1839–1844 (2007).
Davis, B. W. & Ostrander, E. A. Domestic dogs and cancer research: a breed-based genomics approach. ILAR J. 55, 59–68 (2014).
Karlsson, E. K. & Lindblad-Toh, K. Leader of the pack: gene mapping in dogs and other model organisms. Nat. Rev. Genet. 9, 713–725 (2008).
Munson, L. & Moresco, A. Comparative pathology of mammary gland cancers in domestic and wild animals. Breast Dis. 28, 7–21 (2007).
Menotti-Raymond, M. & O'Brien, S. J. in Sourcebook of Models for Biomedical Research (ed. Conn, P. M. ) 221–232 (Humana Press, 2008).
Soares, M. et al. Molecular based subtyping of feline mammary carcinomas and clinicopathological characterization. Breast 27, 44–51 (2016).
O'Neill, D. G. et al. Epidemiology of diabetes mellitus among 193,435 cats attending primary-care veterinary practices in England. J. Vet. Intern. Med. 30, 964–972 (2016).
Lyons, L. A. et al. Whole genome sequencing in cats, identifies new models for blindness in AIPL1 and somite segmentation in HES7. BMC Genomics 17, 265 (2016).
Yamamoto, J. K., Sanou, M. P., Abbott, J. R. & Coleman, J. K. Feline immunodeficiency virus model for designing HIV/AIDS vaccines. Curr. HIV Res. 8, 14–25 (2009).
Vail, D. M. & MacEwen, E. G. Spontaneously occurring tumors of companion animals as models for human cancer. Cancer Invest. 18, 781–792 (1999).
Axelsson, E. et al. The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495, 360–364 (2013).
Andersson, L. S. et al. Mutations in DMRT3 affect locomotion in horses and spinal circuit function in mice. Nature 488, 642–646 (2012).
Petersen, J. L. et al. Genome-wide analysis reveals selection for important traits in domestic horse breeds. PLoS Genet. 9, e1003211 (2013).
Promerová, M. et al. Worldwide frequency distribution of the 'gait keeper' mutation in the DMRT3 gene. Anim. Genet. 45, 274–282 (2014).
Lamichhaney, S. et al. Population-scale sequencing reveals genetic differentiation due to local adaptation in Atlantic herring. Proc. Natl Acad. Sci. USA 109, 19345–19350 (2012).
Wei, C. et al. Genome-wide analysis reveals adaptation to high altitudes in Tibetan sheep. Sci. Rep. 6, 26770 (2016).
Wang, G.-D. et al. Genetic convergence in the adaptation of dogs and humans to the high-altitude environment of the Tibetan plateau. Genome Biol. Evol. 6, 2122–2128 (2014).
Zhang, G. et al. Comparative genomic data of the Avian Phylogenomics Project. Gigascience 3, 26 (2014).
Jarvis, E. D. et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331 (2014).
Amemiya, C. T. et al. The African coelacanth genome provides insights into tetrapod evolution. Nature 496, 311–316 (2013).
Montague, M. J. et al. Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication. Proc. Natl Acad. Sci. USA 111, 17230–17235 (2014).
Saragusty, J. et al. Rewinding the process of mammalian extinction. Zoo Biol. 35, 280–292 (2016).
Ben-Nun, I. F. et al. Induced pluripotent stem cells from highly endangered species. Nat. Methods 8, 829–831 (2011).
Romanov, M. N. et al. The value of avian genomics to the conservation of wildlife. 10 (Suppl. 2), S10 (2009).
Andrén, T. et al. in The Baltic Sea Basin (eds Harff, J., Björck, S. & Hoth, P.) 75–97 (Springer Berlin Heidelberg, 2011).
Martinez-Barrio, A. et al. The genetic basis for ecological adaptation of the Atlantic herring revealed by genome sequencing. eLife 5, e12081 (2016).
Cui, Y., Sheng, Y. & Zhang, X. Genetic susceptibility to SLE: recent progress from GWAS. J. Autoimmun. 41, 25–33 (2013).
Wilbe, M. et al. Genome-wide association mapping identifies multiple loci for a canine SLE-related disease complex. Nat. Genet. 42, 250–254 (2010).
Strang, A. & Macmillan, G. The Nova Scotia Duck Tolling Retriever (Loveland, 1996).
Kozyrev, S. V. et al. Functional variants in the B cell gene BANK1 are associated with systemic lupus erythematosus. Nat. Genet. 40, 211–216 (2008).
Wilbe, M. et al. Multiple changes of gene expression and function reveal genomic and phenotypic complexity in SLE-like disease. PLoS Genet. 11, e1005248 (2015).
Eriksson, D. et al. Extended exome sequencing identifies BACH2 as a novel major risk locus for Addison's disease. J. Intern. Med. 280, 595–608 (2016).
Denas, O. et al. Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution. BMC Genomics 16, 87 (2015).
Acknowledgements
J.R.S.M. was supported by the Swedish Research Council, FORMAS (221-2012-1531). K.L.-T. was supported by the Swedish Research Council, European Research Council (ERC) Starting Grant and Knut och Alice Wallenberg Foundation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
DATABASES
FURTHER INFORMATION
Glossary
- Reference genome
-
A high-quality species genome onto which other information is projected, such as genes, polymorphisms and elements of gene regulation.
- Bacterial artificial chromosome
-
(BAC). Approximately 200,000 bp of sequence that has been cloned into a bacterial vector and can then be amplified and sequenced.
- Whole-genome shotgun sequencing
-
The genome is shattered into smaller pieces and sequenced, originally with Sanger technology.
- Sanger sequencing
-
An old standard type of sequencing in which the four bases are labelled with four fluorophores of different colours. It results in ~600 bp reads and was the methodology used for the human genome project.
- High-quality draft genome assemblies
-
A form of genome assembly that has both long contigs (stretches of uninterrupted sequence, many kb in length) and supercontigs (structures of sequence hanging together but including smaller gaps) in the Mb range.
- Haplotype blocks
-
Regions of the genome that are inherited together without recombination. These are characterized by high linkage disequilibrium.
- Vertebrate model species
-
Vertebrate species that are studied to understand the biology or phenotype in another species.
- Non-model organisms
-
Organisms that are examined to offer insight into themselves rather than principally studied to understand another trait, for example, human health.
- Short-read sequencing
-
(SRS). Short-read technologies, such as Illumina, generate continuous sequence length of 100–250 bp.
- Long-read sequencing
-
(LRS). Strategies such as PacBio's single-molecule real-time (SMRT) generate continuous sequence length in the order of many kilobases. Over time, these technologies will go down in price and will probably be the methods of choice.
- Haplotypes
-
Versions of a gene or part of a gene, including several variants that are inherited together.
- Chromosome interaction mapping
-
A methodology to analyse the 3D organization of chromatin. Looping can be functional, for example, bringing enhancers into contact with distal promoters.
- Single-nucleotide polymorphisms
-
(SNPs). When a position in the genome can have two or more alleles. Biallelic markers are used to look for association of one allele (gene version) with disease.
- RNA sequencing
-
(RNA-seq). The sequencing of all mRNA transcripts from a cell or tissues.
- Histone marks
-
Histones are proteins that package DNA into units. Histone marks indicate where chromatin is open or closed and provide insights into genome regulation; for example, histone 3 lysine 27 acetylation (H3K27ac), is associated with active enhancers.
- Adaptation
-
A trait that has changed to enable a species to function under certain circumstances or in a specific environment.
- Domestication
-
A complex process partly driven by human selection of standing natural variation.
- Convergent evolution
-
The independent evolution of similar features in multiple species of different lineages.
- Pooled genome sequence strategies
-
Several individuals and/or samples can be sequenced together as a group, either with or without barcode labelling to facilitate multiplexing.
- Representative genome assembly approaches
-
When multiple individual genomes are sequenced from a species, the best is selected as a reference genome for that organism.
- N50 contig
-
A statistic used to illustrate genome quality. Genomes are constructed of multiple contigs (a segment of the genome assembly that contains no gaps), each with different lengths. N50 size is the shortest sequence length containing half of the genome sequence.
- Long-range contiguity
-
The linking of large, megabase-sized genomic regions in order to create large continuous lengths of sequence data.
- Conserved synteny
-
The similarity of gene order in large regions of related (and distant) species.
- Microchromosomes
-
Typical in some birds and lizards, these chromosomes are less than 20 Mb in size.
- Selective sweep
-
A region of the genome where there is little to no population-level variation, as one haplotype with favourable alleles has become more common than other variants.
- Hybridization
-
The mating of two different species or populations, resulting in equal proportions of genetic material from both parents.
- Introgression
-
Gene flow from one species into the gene pool of another by the repeated backcrossing of a hybrid with one of its parental species.
- Integrated haplotype homozygosity score
-
(iHS). A method to calculate the amount of genetic similarity across regions in a species or population. High homozygosity suggests selection to be active on that region.
- SNP genotyping arrays
-
A method to genotype predefined single-nucleotide variants distributed across the genome of the species under study. For humans, single-nucleotide polymorphism (SNP) arrays typically have millions of variants, whereas in dogs, hundreds of thousands of SNPs are used for genome-wide association mapping.
- Topologically associating domains
-
(TADs). Regions of the genome packaged together in 3D space, most often containing one or a few genes and their regulatory signals. Genomic interactions within TADs are more frequent than those across TAD borders.
- Microevolution
-
Changes in allele frequencies that happen within a population in a fairly short time span. This can be due to positive selection or drift.
- Macroevolution
-
Changes in allele frequencies that happen between species over a longer time period. This can be due to positive selection or drift.
- Silent changes
-
Changes in DNA sequence without biological consequence.
- Neutral sites
-
Positions in DNA that are not functional and hence are free to mutate randomly.
- Conserved non-coding elements
-
(CNEs). Regions of the genome that are not coding for proteins, but are similar in many species, suggesting a role for these elements in genome regulation.
- Transposable elements
-
Mobile DNA sequences, similar to viruses, that can 'jump' around in the genome and integrate in new locations. These sequences can affect gene expression or give rise to novel regulatory elements.
- Positive selection
-
The force that makes certain genetic positions change in a certain favourable direction.
- Accelerated regions
-
Regions of the genome that are typically conserved across species, but where novel changes have happened in one or more related species. This suggests that the region is under positive selection for the novel variant (or variants).
Rights and permissions
About this article
Cite this article
Meadows, J., Lindblad-Toh, K. Dissecting evolution and disease using comparative vertebrate genomics. Nat Rev Genet 18, 624–636 (2017). https://doi.org/10.1038/nrg.2017.51
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg.2017.51
- Springer Nature Limited
This article is cited by
-
Evolutionary origin of germline pathogenic variants in human DNA mismatch repair genes
Human Genomics (2024)
-
Current advances in primate genomics: novel approaches for understanding evolution and disease
Nature Reviews Genetics (2023)
-
Fish genomics and its impact on fundamental and applied research of vertebrate biology
Reviews in Fish Biology and Fisheries (2022)
-
Chick fetal organ spheroids as a model to study development and disease
BMC Molecular and Cell Biology (2021)
-
The influence of evolutionary history on human health and disease
Nature Reviews Genetics (2021)