Introduction

Just as politics does usually not need to care about history, medicine does usually not need to consider evolution. However, for a more complete understanding, history matters and this is increasingly realized in biomedicine [1, 2]. Maybe best established are areas in which evolutionary processes can be observed directly. This includes the evolution of pathogens [3], pathogen resistance [4], and the development of cancer [5]. But also the inference of the genetic past has in recent years become medically relevant. One reason is that genome-wide association studies use statistical tools rooted in population genetic theory, for example to control for population stratification [6]. Another is that genetic variants can reach high frequencies due to selection, and identifying such loci could be medically informative [7]. Genomic comparisons across species have also increasingly become possible, but so far had a relatively limited or at least a not well-recognized impact on medicine. However, the use of model organisms such as the mouse is inherently an evolutionary question, and genetic conservation across species is currently the central tool to interpret human genetic variants associated with diseases. With the prospect that 10,000 vertebrate genomes might be available soon [8], I review areas in which the comparative genomic approach has been directly relevant for medical questions. I focus on primates (Fig. 1) that are of special relevance due to their close relationship with humans and try to give a perspective on how the incorporation of comprehensive phenotypic data across many species could be an important tool for medical research.

Fig. 1
figure 1

Mammalian and primate genomes. a Phylogeny of mammalian genome sequences that are currently used to estimate conservation (genome.ucsc.edu). Note that most are assemblies built from twofold (2×) sequence coverage and contain many gaps. Scale is 0.1 substitutions at fourfold degenerate sites as given in genome.ucsc.edu. b Phylogeny of primates according to Perelman et al. [25]. From the 186 species, I chose those for which genomic sequence is currently available from www.ensembl.org (in bold with sequence coverage), are approved sequencing targets (mainly http://www.genome.gov/10002154, non-bold with aimed coverage), or illustrate the available diversity in old world and new world monkeys. Approximate relative branch lengths are taken from Perelman et al. 2011 [25] and Reich et al. 2011 [29] for Denisovan and Neanderthal. Scale bar is drawn to match branch lengths in (a)

Identifying constrained regions in the human genome

Probably the most common use of comparative sequence data is assessing conservation. The increased sequencing and genotyping capacities have led to an explosion of genetic variants that are associated with human diseases. However, genetic information alone is often not sufficient to identify single variants that are causally related to disease. Hence, additional information, i.e., different prior probabilities for causality, is required. Currently, the most informative single source is evolutionary conservation [9]. A sequence can be considered as conserved and hence functional if less nucleotide substitutions are observed among species than expected because genetic variants that result in fewer offspring do not reach high frequencies in populations and are therefore less likely to be observed as differences among species (see e.g., Hurst for a general introduction [10]). A recent landmark study [11] has analyzed genome sequences of 29 mammals and estimated that at least 5.5 % of all 12-bp windows in the human genome have acquired significantly less nucleotide differences among mammals than expected, i.e., can be considered functional and 76 % of these windows could also be reliably located in the human genome. Especially because the majority of disease associated variants are located outside of protein-coding transcripts, this is crucial information to identify putative causal variants for functional follow-up studies. Disease-associated variants are clearly enriched within conserved regions [11, 12], but it is of course an open question how often this information will be really decisive to understand and eventually treat diseases, especially those that affect humans later in life or depend on human-specific gene environment interactions.

The power to detect conservation depends on the expected number of neutral, i.e., nonfunctional, substitutions across the included species [13]. In the analyzed 29 mammals, one expects 4.5 substitutions per base pair and an increase to 15–25 substitutions per base pair with 100–200 eutherian mammals would allow single-nucleotide resolution [11]. Importantly, these estimates assume that the sequence is constrained across all mammals, but especially functional non-coding sequence might have a high turnover rate [14]. Hence, a proportion of the functional elements would be restricted to particular lineages, which could add another 5 % of the genome that would be functional in any particular species and not be detected currently [15]. Hence, comparative primate genomics will be of particular importance for annotating the human genome. If genomes of most primates were available (see Box 1 and Fig. 1), this would sum up to an expected 1.5 substitutions per base pair, which will have a lower resolution of detecting constrained elements, but is without alternative for primate-specific functional elements. That functional elements can be identified just from primate sequences have already been shown [1619], and the need to do this is most obvious for genes that only exist in humans and primates.

LPA as an example for a primate-specific gene

Although it is clear that the gene repertoire is fairly similar across mammals, there is still a significant proportion of genes that are specific to particular lineages [32]. It has been estimated that ~9 % of all human genes arose after the split from mouse [33]. Evolutionary conservation of many of those genes clearly suggests that they do have a function, and many seem to be involved in testis development [32, 33] and primate brain development [34]. One medically relevant example of a primate-specific gene is LPA, the gene encoding the defining component of the lipoprotein a (Lp(a)) [35]. Genetic variants in LPA that increase Lp(a) levels increase the risk of coronary disease in humans [36]. Lp(a) is restricted to old world monkeys, apes, and humans [37, 38] because LPA arose as a gene duplication of plasminogen in the common ancestor of old world monkeys and new world monkeys [37, 39, 40]. Curiously, a gene analogous to LPA evolved independently by a gene duplication from plasminogen in hedgehogs [37, 39, 40]. By comparing 18 sequences from old world monkeys, Boffelli et al. [16] identified and verified regulatory elements in the LPA promoter, serving as a proof-of-principle that identifying conserved regulatory elements in a restricted set of primates is possible.

Despite its medical relevance and research on Lp(a) for almost 50 years, relatively little is known about its physiological function [35], also because neither humans nor baboons without Lp(a) have any apparent phenotype [41]. How can one be certain that there is any function? An easy and powerful approach is to estimate evolutionary constraints on reading frame disruptions. Especially insertions and deletions lead easily to reading frame disruptions and are observed genome-wide at a rate of 1 in 0.5 billion base pairs per generation or approximately at a tenth of the nucleotide substitution rate [42]. The open reading frame of LPA is 13,644 bp, and one would expect that on average ~16 indels would accumulate between a human and a chimpanzee that are separated by 12 million years or ~0.6 million generations. Hence, the chance to observe no indel can be calculated using a binominal distribution and is 8 × 10−8. The chance to see no frameshift between a human and a baboon that are separated by over 50 million years [25] is essentially zero. Hence, for the case of LPA, it is clear that some physiological function that conserved the open reading frame must exist. For genes that are more restricted to particular lineages, it will be important to develop more precise models that especially take into account context-dependent indel rates among primates. It is important to keep in mind that a physiological function just needs to ensure that individuals with LPA have on average more offspring than individuals without LPA. How many more offspring are needed to overcome chance effects depends on the effective number of individuals (Ne), respectively chromosomes (2Ne). Chance (also called genetic drift) and selection are equally strong when the selective advantage is 1/2Ne. So selection dominates when the advantage is > > 1/2Ne (one can think of it as how often one needs to toss a coin to measure a bias towards one side). Effective population size estimates from current variation ranges e.g. between ~10,000 and 30,000 in human and chimpanzee populations [43]. So as a very rough estimate, physiological functions are conserved in primates when they ensure on average considerably more than 0.0017–0.005 % more offspring (see e.g., Hurst [10] for an introduction).

Hence, it is no discrepancy that humans or baboons without Lp(a) [41] have no apparent phenotype since the evolutionary advantage of possessing Lp(a) could be generally small or just matter under particular environmental conditions. With this evolutionary background in mind, it might be worth to reinvestigate human null alleles. It would also be informative whether LPA is present in all old world monkeys and whether one could find correlations with its expression levels with environmental variables in different species, such as pathogen load or diet. In this respect, it is remarkable that a gene analogous to LPA evolved independently by a gene duplication from plasminogen in hedgehogs [37, 39, 40]. A priori a good hypothesis for a physiological function in genes that change relatively rapidly across mammals is an involvement in the immune system. Genes annotated in the immune system show evidence for positive selection more often than most other categories [11], and pathogenic environment had a larger impact on genetic differences among human populations than diet regimes or climatic conditions [44]. Since evidence is accumulating that lipoproteins in general are important components of the immune system [45] and apo(a) regulates neutrophil recruitment [46], a physiological function of Lp(a) in this context is certainly well compatible with the evolution of Lp(a). Alternative explanations, such as a role of Lp(a) in wound healing [47], would need to explain why Lp(a) evolved independently in old world monkeys and hedgehogs and is absent in other mammals and primates.

In summary, comparative primate genomics is essential to annotate primate-specific genes, and LPA is a medically relevant example of such a gene. As also argued below, additional information on genotype–phenotype correlations across primates might be informative for understanding the physiological functions of such genes.

APOE as an example for compensatory mutations

Whether a particular genetic variant causes a disease can depend strongly on other sites in the genome, a phenomenon called genetic interaction or epistasis (see e.g., [48] for a recent review on molecular mechanisms). This can be medically very relevant if for example a disease mutation in humans is not causing disease phenotypes in a model organism such as the mouse. Remarkably, this seems to occur frequently: Analyzing vertebrate orthologues of 32 proteins with well-known disease mutations in humans, Kondrashov et al. [49] estimated that 10 % of all amino acid substitutions observed in vertebrates would be pathogenic in humans. Another way of describing this is that nucleotide substitutions that are known to be pathogenic in humans are just five times less likely to be observed in other species than substitutions for which no pathogenic association is known. Genome-wide studies, e.g., using the chimpanzee genome [50], the rhesus genome [51], or the Neanderthal genome [52], have confirmed such a high rate. The reason for the vast majority of the cases is probably that one or several other substitutions—often in the same protein—compensate the effect [53]. Apolipoprotein E (APOE) is a medically relevant example of this phenomenon

Apolipoprotein E is a ligand for lipoprotein receptors and important for lipid metabolism and transport (see e.g., [54] for a recent review). Three isoforms (APOE2, APOE3, and APOE4) are frequent in humans and have been associated with a risk for cardiovascular disease and especially late-onset Alzheimer’s disease, for which APOE4 is a stronger risk predictor than any other common variant [55]. The isoforms derive from two polymorphisms that change the amino acids of the processed APOE at position 112 and 158 (see Fig. 2). Interestingly, the two cysteines defining APOE4 at these two positions are the ancestral state, present in chimpanzees, all other primates, and most mammals (http://genome.ucsc.edu/). Without any further information, this would imply that most mammals would have an APOE with properties similar to human APOE4. However, it turns out that at a structural and functional level most mammals including the mouse are more like APOE3 (reviewed in [56]). The reason is that human APOE has an arginine residue at position 61, whereas chimpanzees, all other primates, and most mammals have a threonine at this position. Without this arginine residue, there is no interaction of the N-terminal and C-terminal domain of APOE4, and knock-in mice in which residue 61 is “humanized,” i.e., changed to arginine, suggest that this domain interaction might be responsible for most of the APOE4-associated neuropathology [55, 57, 58]. Disrupting this domain interaction pharmaceutically is a promising route to treat Alzheimer’s disease [59]. It would be interesting to investigate more systematically whether the different mammalian APOEs indeed show no domain interaction, i.e., if this is a conserved feature of APOE structure. More generally, the example shows that a more systematic evolutionary approach might lead faster to appropriate mouse models and structure–function relationships also for other disease proteins.

Fig. 2
figure 2

Evolution of domain interaction in APOE. The substitution of threonine to arginine at position 61 occurred after the split of humans and chimpanzees some 6 million years ago and before the split of modern humans and Denisovans some 800,000 years ago [29]. This enables an interaction of the N-terminal and C-terminal domain of APOE which gets disrupted by the change from arginine to cysteine at position 112. Note that there are additional amino acid substitutions on the tree (e.g., four more on the human lineage and three on the chimpanzee lineage). The Denisovan sequence at the three depicted positions was inferred from the available sequence at genome.ucsc.edu

This functional insight has also consequences for the evolutionary interpretations of the existing APOE isoforms (e.g., [60]). These have mainly tried to explain what selective advantage could have driven the spread of APOE3 and APOE2. But if the scenario described above is correct, the central question is what drove the Arg61 variant to fixation on the human lineage, i.e., why did the APOE4 allele arise. Given the strong conservation at this position and the variety of mostly negative effects associated with the APOE4 allele, it seems likely that this change had some negative consequences. Such slightly deleterious mutations can nevertheless get fixed by chance, especially when selection is weaker in small populations (see e.g., [61]). Fixation could also occur due to positive selection on sites in linkage disequilibrium [62]. Just 20 kbp upstream of APOE is the start of the gene poliovirus receptor-related 2 (PVRL2) that experienced strong positive selection throughout mammalian evolution [63], potentially related to its role as viral receptor. The two amino acid changes leading to APOE3 and APOE2 alleles could then be viewed as compensatory mutations, and sequence data [64] as well as simulation data [65] are compatible with such a scenario. Since compensatory mutations are frequent (see above), this scenario can be regarded as an appropriate null hypothesis for explaining the existence of APOE alleles. The alternative is that the negative impact of the T61R change was outweighed by some advantage that could be related to the immune system, reproduction, or cognitive functions (see Trotter et al. [66] for a well-balanced recent review). Plausible explanations would need to take into account that the threonine at position 61 is well conserved across mammals and has apparently not been selected in other lineages. It has been claimed that APOE has a higher rate of protein evolution on the human lineage due to positive selection using APOE sequences from human, mouse, rat, chimpanzee, and dog [67]. However, when using the same method with the now available sequences from human, chimpanzee, orangutan, macaque, baboon, marmoset, rat, mouse, dog, and cow, this result does not hold up (W. Enard, unpublished observation). So I think that one cannot currently reject the null hypothesis that APOE4 got initially fixed in humans by chance or due to linkage to a selected variant in PVRL2 and that APOE3 and then APOE2 rose to high frequency to compensate for this slightly deleterious change. Obviously, it will be important to use mice models and human data to further explore functional differences among these alleles.

Identifying positively selected regions in the human genome

The power of comparative genomics lies in detecting constraints because it uses information from multiple lineages in which the function of the analyzed genomic element is conserved. However, a genomic element could also acquire a new or altered function on one or a few lineages, either due to chance or because the new function was adaptive on these lineages. Understanding these processes is of course highly relevant to understand evolution and human evolution in particular. One possible example is the transcription factor FOXP2, in which two amino acid changes occurred during human evolution that could have been relevant for adapting particular brain circuits to speech and language (reviewed in [68]). Recent adaptations that are caused by genetic variants that are still polymorphic in humans are medically even more relevant. Variants in hemoglobin causing sickle cell anemia and malaria resistance are a classical example, others are skin pigmentation, lactase tolerance, or adaptations to low oxygen at high altitudes [69]. A positively selected variant leaves more offspring than a neutral variant, and this can lead to a “selective sweep” signature in the linked genomic region. This signature is erased over time and in humans gets difficult to detect after roughly 10,000 generations (~250,000 years) [70]. If for a particular gene or genomic element such adaptive events occurred often enough, one can detect positive selection also by comparing different species, in the case of protein-coding genes, by an elevated rate of non-synonymous substitutions that change the encoded amino acid (often called Ka or dN) versus the rate of synonymous substitutions that do not change it (Ks or dS). These two principal ways to detect positive selection (reviewed e.g. in [7173]) have been applied genome-wide and have e.g. identified more than 2,000 genes as potentially selected during recent human evolution [74]. Across species, the recent analysis of 29 mammals [11] is the most comprehensive analysis to date and finds for 84 % of the 6.05 million codons in 12,871 gene trees evidence of strong purifying selection (dN/dS < 0.5) and for 2.4 % of codons evidence for positive selection (dN/dS > 1.5).

It is beyond the scope of this article to review these approaches in detail, especially since its impact on medical genomics has just recently been discussed [7]. However, I would like to make a few, rather cautionary, remarks:

Firstly, the false positive rate and false negative rate of scans for selective sweeps are probably high [75]. This is mainly due to the inherently stochastic nature of how individuals are related for a particular genomic region (reviewed e.g. by [76]). Hence, sequencing more human genomes will help, but will not help much. However, what does help is obtaining genomic information from extinct humans such as Denisovans and Neanderthals (see Box 1). Secondly, the signature of a selective sweep and background selection, i.e., the removal of haplotypes from the population because they are linked to deleterious variants, are in many respects similar and difficult to disentangle [7779]. Hence, negative selection rather than positive selection could be responsible for many cases of selective sweep candidates. Thirdly, for comparisons across species, i.e., when detecting repeated positive selection in a genomic element, the power is good if positive selection occurs in many species. Many immune related genes that interact directly with pathogens fall in this category, but also unexpected categories show a strong signal such as meiotic chromosome segregation [11]. An example of medical relevance is the antiretroviral factor TRIM5α, which shows a strong signature of selection on several primate lineages including humans [80]. Whereas the human ortholog restricts replication of an extinct retrovirus [81], the rhesus ortholog restricts HIV-1 [82]. In contrast, identifying positive selection that is specific to a particular lineage and could be linked to species-specific adaptations is much more difficult. It would be very helpful for interpreting differences in selection across lineages, if one could correlate them with phenotypes (Fig. 3), as pointed out in the next section. Finally, it is important to keep in mind that adaptations can lead to signatures of positive selection, but not all or maybe even only a small minority of signatures of positive selection are adaptations to an ecological niche [83]. As laid out in the previous section, differences between species that compensate slightly deleterious variants seem rather frequent. Although this is less likely for strong selective sweeps, which are probably rare [84], it could make up a substantial proportion of substitutions fixed by positive selection, which in the case of humans could be for example 10 % of all amino acid substitutions [85].

Fig. 3
figure 3

Correlating changes in phenotypes and changes in genotype across species. a A primate phylogeny and a putative trait such as relative testis size. b A putative correlation of a measure of phenotype change and genotype change (e.g., Ka/Ks as a measure of protein evolution). Note that such a correlation has to take into account that measures are correlated due to the phylogeny as well as other deadly sins of comparative analysis [108]

The main consequence from the issues pointed out above is that additional functional information needs to be added since genetic information alone is not sufficient to reliably exclude false positives except in the most extreme cases [69, 86]. Furthermore, biological information is required to identify affected functions and potentially selected traits. If the selected variants under question are still segregating in the human population, then many possibilities exist to test selective scenarios for example in large human cohorts (see e.g. [87] for such an approach). However, if the variants are fixed in humans, it is much more difficult to investigate the phenotypic consequences, although the case of a mouse model for studying the human-specific effects of FOXP2 might allow for careful optimism [68, 88]. Another way of putting it is that patterns of positive selection can be medically informative, especially if combined with functional assays.

The perspective of evolutionary systems biology in primates

On the one hand, sequencing and genotyping technologies have already or will soon improve to an extent that they are not longer the major limiting factor. On the other hand, the link between genetic variation and human disease is much more complex than initially hoped. Hence, the next hope and next challenge in biomedicine is to collect and integrate phenotypic data, in particular molecular phenotypes such as gene expression, which can be measured with high throughput [89, 90]. This approach, whether called systems biology, functional genomics, or biology, should profit from comparative data in a similar way as has the analysis on the DNA level. This approach works in yeasts (see e.g., [91] for a recent review or [92] for a recent analysis in fission yeasts) and starts to be applied in mammals. One example is the modeling of sequence differences, expression, and transcription factor binding in preimplantation development using human, mouse, and cow stem cells that allowed the identification of conserved and species-specific regulatory networks in these species [93]. Extending this approach to primates and to a variety of phenotypes, especially those of medical relevance, should be a worthwhile endeavor:

At a relatively simple level, it will be interesting to see how measures of protein evolution or positive selection on primate lineages correlate with phenotypic changes on these lineages (Fig. 3). Such correlations have so far been shown only for individual genes. For example, the rate of protein evolution for CDK5RAP2 and ASPM, two genes associated with primary microcephaly, was found to correlate with neonatal brain size in primates [94]. Other examples include a faster evolutionary rate of SEMG2 [95, 96] or immunity genes [97] in more promiscuous primate species. It will be interesting to see how often such correlations are found genome-wide, also because it is a unique way of obtaining functional information for genes. For many species, especially for the well-studied primates, a lot of such phenotypic information is already available. In the light of the coming genomic data, it will be valuable to collect additional, well comparable data across as many primate species as possible. In addition to ecologically relevant parameters such as mating systems, pathogens, or diet, it would be worthwhile to collect phenotypes of more direct medical relevance. Imagine, for example, one could measure Lp(a) levels across a range of primates and correlate this with environmental and genetic changes across the phylogeny. This might reveal crucial information about the still unknown physiological functions of this lipoprotein (see above). A good entry point for such comparative data might be human cohort studies that measure e.g. a large range of blood parameters for medical reasons (e.g., [98]).

A very powerful phenotyping method is assessing genome-wide expression patterns, especially since high-throughput sequencing allows to simultaneously assess transcript structure and expression levels as recently applied for six organs across nine mammals [99]. For primates, especially humans, chimpanzees and rhesus macaques have been compared, and the field has matured from a few samples 10 years ago [100] to studies that integrate metabolic, miRNA, and proteomic data across postnatal development in dozens of samples [101, 102]. For example, a recent analysis has revealed that synaptic development is extended in human childhood compared to chimpanzees and macaques, specifically in the prefrontal cortex [103]. If one could extend such approaches to more species, more tissues, and more developmental periods, one could expect a tremendous insight into human biology and disease by identifying constraints and flexibility in such developmental systems. Unfortunately, the availability of suitable tissues is a huge limiting factor, in particular for developmental stages. This limitation is akin to the limited access to tissue samples of human patients. Overcoming this limitation and modeling relevant phenotypes of human diseases in vitro is a major promise of induced pluripotent stem (iPS) cells [104]. It is likely that human protocols can be readily applied to generate iPS cells of many primates [105, 106]. One could imagine generating a panel of human, primate, and mammalian iPS cells to which disease-relevant assays can be applied and variable and conserved phenotypes can be distinguished to interpret disease-related variation (Fig. 4). The prospect that this can be combined with targeted genetic modifications in these cell lines using engineered nucleases such as zinc fingers or TALENs [107] could make this a decisive tool in leveraging the potential of comparative data for medical questions.

Fig. 4
figure 4

Illustration how comparative data could help to interpret disease-related phenotypes. Imagine one would measure gene expression levels (y-axis) across time (x-axis) in differentiating iPS cells from patients and controls and identify genes or groups of genes that differ. To interpret disease-associated changes, one could collect the same data from primate and mammalian iPS cells and distinguish among disease-associated patterns that are conserved and that are more variable, similar to the approach of interpreting disease-associated variants on the DNA level

Conclusions

Biomedicine cannot afford ignoring the unique information that can be obtained from comparative genomic data, especially those from humans’ closest relatives, the primates. Identifying constraints, including primate-specific constraints and epistatic constraints, is crucial in order to interpret disease-associated variants and to improve animal models for diseases. Just as functional studies are needed to interpret human genetic variation, functional studies are crucial to interpret evolutionary changes for particular genes. Hence, collecting comprehensive and comparable phenotypic data across many species is a necessary next step. High-throughput methods for molecular phenotypes will be particularly valuable, and iPS cell technology should allow measuring such phenotypes in a comparable way across a large number of species.