Keywords

1 Introduction

There is currently a growing interest in conducting genome scans of DNA polymorphism to identify loci that have contributed to high-altitude adaptation in humans [1, 2, 57, 25, 45, 55, 8285]. This population genomics approach is premised on the idea that the locus-specific effects of positive selection can be detected against the genome-wide backdrop of stochastic variation. Consider a genome-wide survey of DNA polymorphism in individuals sampled from a pair of high- and low-altitude populations . If a given trait is subject to directional selection only at high altitude, then the underlying loci are expected to undergo shifts in allele frequency in the highland population relative to the lowland population. In principle, it is possible to identify chromosomal regions that harbor such loci by exploiting theoretical predictions about the effects of positive selection on patterns of DNA polymorphism at linked neutral sites. However, the effects of selection on patterns of variation at or near causative loci depend strongly on the genetic architecture of the selected trait and numerous other factors [13, 28, 31, 32, 46, 60, 68, 73, 74].

The population genomics approach can be used as a means of “outlier detection” to nominate candidate loci as putative targets of positive selection, or it can be used to assess evidence for selection on previously identified candidate loci by assessing whether such loci emerge as outliers against a backdrop of genome-wide variation. Both approaches have been used successfully to identify loci involved in adaptation to high-altitude environments. In humans, for example, genome-wide surveys of DNA polymorphism in Tibetan highlanders revealed evidence for strong and recent positive selection on several genes that function as upstream regulators of the hypoxia inducible-factor (HIF) oxygen signaling pathway [57, 45, 55, 8285]. The HIF family of transcription factors plays a key role in regulating oxygen homeostasis by coordinating the transcriptional response to hypoxia. One of the HIF genes that exhibited an especially clear signal of positive selection in the Tibetan population was the EPAS1 gene (endothelial PAS domain protein 1) (Fig. 8.1), also known as HIF2α, which encodes the oxygen-sensitive α subunit of the HIF-2 transcription factor. The product of EPAS1 is known to play an especially important role in regulating the erythropoietic response to hypoxia [47]. Another HIF-regulatory gene that exhibited strong evidence for selection in both Tibetan and Andean populations was EGLN1 (Egl Nine homolog 1) [1, 6, 7, 45, 55, 82, 84, 85], which encodes the prolyl hydroxlase isozyme (PHD2) that is responsible for hydroxylating the α subunit of the HIF1 transcription factor. Results of these studies provide proof-of-principle that the genome scan approach can successfully identify targets of recent positive selection, and the integration of such analyses with functional studies can provide additional insights into possible phenotypic targets of selection [34].

Fig. 8.1
figure 1

A genome-wide scan of allelic differentiation between population samples of Tibetans (resident at 3200–3500 m in Yunnan Province, China) and Han Chinese . The vertical axis of the graph shows the negative log of site-specific P-values for allele frequency differences between the Tibetan and Han Chinese population samples (low P-values denote allele frequency differences that are too large to explain by genetic drift). The horizontal axis of the graph shows the genomic positions of each assayed nucleotide site, arranged by chromosome number. The red line indicates the threshold for genome-wide statistical significance (P = 5 × 10−7). Values are shown after correction for background population stratification using an intragenomic control. Several noncoding variants flanking the EPAS1 gene are highly significant outliers. Reprinted from [5]

An example of how the population genomics approach can be combined with the functional analysis of individual candidate genes is provided by an integrative analysis of hemoglobin polymorphism in natural populations of North American deer mice ( Peromyscus maniculatus). Multilocus surveys of nucleotide polymorphism in high- and low-altitude populations revealed evidence for a history of spatially varying selection at two α-globin gene duplicates and two β-globin gene duplicates [40, 6266], and site-directed mutagenesis experiments involving recombinant hemoglobins quantified the additive and nonadditive effects of the causative mutations [41]. The population genetic evidence for spatially varying selection and the experimental measures of mutational effects corroborated previous research on wild-derived strains of deer mice, which had demonstrated that allelic variation in hemoglobin-oxygen affinity contributes to adaptive variation in whole-animal aerobic performance under hypoxia [11, 12, 59]. Similarly, genome-wide surveys of nucleotide variation in a number of Andean birds species have provided insights into the evolutionary forces that have shaped altitudinal patterns of hemoglobin polymorphism [17, 21, 42]. In each of these studies, population-genomic inferences about the adaptive significance of observed amino acid polymorphisms were tested by conducting functional analyses of native hemoglobin variants and engineered recombinant hemoglobin mutants that quantified the phenotypic effects of individual mutations.

In addition to identifying particular candidate loci for high-altitude adaptation, genome scans for signatures of positive selection can also be used to gain more general insights into the nature of adaptation to different environments. For example, in comparisons between high- and low-altitude populations of a given species, it is possible to assess whether certain classes of loci make disproportionate contributions to adaptive phenotypic evolution. We can assess whether genes that occupy particular positions in metabolic pathways or regulatory networks make disproportionate contributions to adaptation, and we can assess the relative importance of structural mutations (e.g., amino acid mutations that alter the catalytic efficiency of an enzyme) and regulatory mutations (e.g., cis- or trans-acting mutations that alter the expression of the enzyme-encoding gene).

2 The Relative Importance of Structural vs. Regulatory Changes in Physiological Adaptatation

It has been suggested that cis-acting regulatory mutations may make a disproportionate contribution to adaptive evolution because such changes generally have fewer pleiotropic effects relative to changes in coding sequence [10, 57, 58]. This is because cis-regulatory elements (e.g., promoters, enhancers, and 5′ and 3′ untranslated regions [UTRs]) are often functionally modular—distinct sequence motifs control discrete temporal phases and/or spatial patterns of gene expression [10, 56, 81]. Each cis-regulatory module represents a collection of transcription factor binding sites that encodes a particular transcriptional output, and mutational changes in a single module will typically alter a small part of the gene’s total transcriptional pattern. For example, a cis-regulatory mutation may affect transcription in one particular tissue or cell type, and will therefore have minimal pleiotropic effects on the regulatory network as a whole. By contrast, structural changes in the coding sequence of a given gene would be manifest in every tissue or cell type in which the affected protein is expressed. Likewise, mutations in the coding sequence of a transcription factor that affect DNA binding affinity could potentially affect the transcriptional control of myriad downstream regulatory targets. For these reasons, coding mutations are generally expected to have more far-reaching pleiotropic effects than cis-regulatory mutations, and may therefore have smaller net fitness benefits.

A number of recent studies have documented evolutionary changes in phenotype that were caused by mutations in modular cis-regulatory elements [26, 48, 75]. In humans, persistence of lactase expression into adulthood has evolved independently in several different ethnic groups, and in all cases the ontogenetic changes in the expression of the Lct gene were attributable to point mutations in cis-regulatory elements [75]. Another good example involves the evolution of reduced abdominal pigmentation in Drosophila santomea , which is caused by several distinct inactivating mutations in an abdomen-specific cis-regulatory element of the tan gene [26].

There are good reasons to expect that a disproportionate number of the mutations that contribute to phenotypic evolution are concentrated in the cis-regulatory regions of transcription factor genes that serve as central control points in regulatory networks [57, 58]. This expectation is based on the observation that pleiotropic effects are determined by how regulatory networks shape the phenotype. For example, among species in the genus Drosophila, evolutionary changes in larval trichome patterning are mediated by cis-regulatory substitutions in a transcription factor called shavenbaby/ovo (svb) [38, 71, 72]. The svb transcription factor integrates a vast array of cellular signals and produces an on/off transcriptional output that determines cellular differentiation into trichomes or naked cuticle. It appears that anatomically localized changes in trichome patterning can be achieved most efficiently through mutations in specific cis-regulatory enhancers of svb because such mutations have minimal pleiotropic effects. By contrast, coding mutations in svb would produce changes in trichome patterning in every spatial domain of the cuticle in which the protein is expressed, and changes in upstream regulators of svb would alter the development of other epidermal structures besides trichomes. Thus, genes located at integrative control points in a regulatory network can accumulate mutations with specific, minimally pleiotropic effects, and these mutations are predicted to be especially common in cis-regulatory regions of “control point” genes [57, 58].

In analogy with the role of svb in the development of trichome patterning in Drosophila embryos, the EPAS1 gene may occupy an analogous position in the regulatory network that governs the transcriptional response to hypoxia. It may be that the physiological response to hypoxia is most efficiently accomplished by modulating the transcription of EPAS1, which then causes a coordinated change in the expression of all downstream target genes.

Future research should reveal whether cis-regulatory mutations in EPAS1 and other upstream regulators of the HIF signaling pathway have made disproportionate contributions to hypoxia adaptation in other animal species that are native to high-altitude environments. Beyond the important goal of identifying convergent mechanisms of hypoxia adaptation in different species, such research could also contribute to the more general goal of discovering whether the architecture of gene regulatory networks can be used to predict which genes are likely to be “hot spots” for adaptive physiological evolution.

3 Integrating the Analysis of Coding Sequence Variation and Transcriptional Variation

Gene expression profiles represent an important source of phenotypic data at the molecular level, and detailed studies of transcriptional variation may help to identify mechanisms of genetic adaptation and/or physiological acclimatization [61, 67, 79]. Plasticity in most physiological traits is probably mediated to a large extent by environment-specific changes in the transcriptional activity of multiple underlying genes. As stated by Hochachka and Somero [24]: “The evolution of phenotypic plasticity requires development of a complex set of tightly integrated environmental sensing and gene regulation mechanisms that allow the organism to sense and then respond appropriately to an environmental change”.

In principle, genomic technologies that permit the simultaneous analysis of sequence variation and expression profiles for a set of genes in the same pathway can be used to identify both structural and regulatory mechanisms of adaptation. For example, RNA-seq technology can be used to characterize sequence polymorphism and transcript abundance in thousands of expressed mRNAs from specific tissue types [27, 43, 78]. The integrated analysis of DNA sequence variation, genome-wide variation in transcriptional profiles, and variation in organismal phenotypes in a linkage or association mapping population can yield important insights into how regulatory networks shape variation in complex traits [3, 36, 49]. This approach is most powerful when implemented in a common garden or reciprocal-transplant experimental design to quantify the environmental and genetic components of gene expression differences between populations.

To appreciate the types of evolutionary inferences that can be drawn from a typical RNA-seq analysis, consider a common garden experiment involving individuals sampled from a pair of high- and low-altitude populations. Using animals that have experienced uniform acclimation histories to control for environmentally induced variation in gene expression, tissue specific cDNA libraries are then constructed for transcriptome sequencing . For each protein-coding gene, it is possible to measure nucleotide differentiation between high- and low-altitude populations and the corresponding differentiation in expression levels that is attributable to additive genetic effects. If adaptive genetic differentiation in the expression of a given gene is attributable to changes in proximal cis-regulatory elements (i.e., mutations in the promoter region), then the gene in question may exhibit an elevated level of nucleotide differentiation due to divergent selection at the immediately adjacent noncoding sites. When adaptive differentiation in the expression of a given gene is attributable to trans-acting mutations , then we would not expect a correlated increase in nucleotide differentiation at or near the gene itself because flanking regulatory mutations would not have contributed to the selection response.

For the purpose of identifying candidate loci for environmental adaptation, one of the disadvantages of the RNA-seq approach is that the assayed variation is exclusively restricted to the transcripts of protein-coding genes. Thus, adaptive changes in noncoding DNA may go undetected if the causal mutations are distant from genic regions. Population genomic analyses of spatially varying selection in D. melanogaster provide suggestive evidence that noncoding DNA polymorphisms may make an unexpectedly large contribution to environmental adaptation. For example, in genomic comparisons between temperate and tropical Australian populations of D. melanogaster, chromosomal regions that exhibited the highest levels of differentiation contained an over-representation of unannotated noncoding sequence [30]. Secondly, RNA-seq analysis can only identify expression differences that are mediated by changes in the rate of transcription or mRNA stability—regulatory changes that are mediated by posttranslational modifications would remain undetected.

4 Dissecting the Genetic Architecture of Adaptive Regulatory Variation

It is possible to further assess the contributions of cis- and trans-acting regulatory mutations to variation in the expression of a given gene by quantifying the relative abundance of allele-specific transcripts (provided that the transcripts are distinguished by at least one diagnostic nucleotide change). In a diploid cell, trans-acting regulatory mutations affect the expression of both alleles of a given target gene, whereas cis-acting regulatory mutations exert effects that are allele-specific [19, 39, 80]. For this reason, it is possible to distinguish the effects of cis- and trans-acting regulatory mutations in modulating the expression of a given gene by comparing the relative expression of alternative alleles in heterozygotes. Assuming that the expression of each allele is independent of the other, a marked asymmetry in allele-specific transcript abundance is typically attributable to cis-regulatory changes because both alleles are transcribed in the same trans-regulatory cellular environment [19, 39]. RNA-seq allows for high-throughput assessment of allele-specific gene expression because it provides a means of quantifying transcript abundance on an allele-specific basis. Although analyses designed to characterize allele-specific expression patterns are most powerful when applied to controlled crosses between inbred strains [22, 29, 39, 70], powerful evolutionary inferences can still be made in RNA-seq analyses of samples from outbred, natural populations.

Ideally, inferences derived from genomic or transcriptomic analyses can help guide the design of detailed follow-up experiments to obtain insights into the mechanistic basis of adaptive trait variation [68, 69]. For example, at the biochemical level, variation in metabolic traits may stem from allelic variation in enzyme concentration, enzyme kinetics, or a combination of the two. The combined effects of enzyme concentration and enzyme kinetics are measured by the maximal velocity of the enzyme-catalyzed reaction, V max, which is defined as the product [E] × k cat, where [E] is the enzyme concentration and k cat is the kinetic constant. In RNA-seq analyses of samples from high- and low-altitude populations, metabolic genes that exhibit extreme differences in expression or allele frequency can be targeted for functional studies of enzyme kinetics. For example, cases where V max and gene expression vary in the same direction suggest that differences in V max may be attributable to differences in enzyme concentration (regulatory mutations). Conversely, cases in which differences in V max are not accompanied by changes in gene expression would implicate mutations that alter enzyme catalytic efficiency (structural mutations). Similarly, traditional expression QTL (eQTL) mapping approaches can be used to confirm inferences about the genetic architecture of the observed regulatory variation. Thus, results of RNA-seq analyses can generate novel hypotheses that can be experimentally tested.

5 Integrating Functional Genomics and Experimental Analyses of Whole-Animal Physiology

Integrating analyses of transcriptomic variation with measures of physiological traits can reveal how variation in regulatory networks gives rise to phenotypic variation at different hierarchical levels of organization. This is the goal of the “systems genetics” approach —to achieve a mechanistic understanding of the mapping function that relates genotype to phenotype by examining how genetic variation in intermediate molecular phenotypes such as transcript abundance is transduced into variation in higher-level traits [3, 18, 33, 36, 49, 54].

In analyses of tissue-specific transcriptome profiles, statistical associations between particular traits and levels of transcript abundance can reveal trait-specific modules of co-regulated genes . These transcriptional modules typically comprise sets of genes that interact in the same pathway or regulatory network. For this reason, correlations in transcript abundance among genes in the same module provide a means of constructing genetic interaction networks, and can suggest hypotheses about causal effects [49].

Integrated analyses of transcriptomic variation and physiological phenotypes have been successfully applied in studies of high-altitude adaptation in North American deer mice. Relative to mice that are native to lowland environments, high-altitude deer mice have evolved enhanced capacities for aerobic performance under hypoxia [1416, 23, 35], and these population differences in whole-organism performance stem from genetically based physiological changes at several hierarchical levels of biological organization. For example, the elevated thermogenic capacity of high-altitude deer mice is associated with an increased capacity to oxidize lipids as a primary fuel source during aerobic thermogenesis [14]. This difference in lipid catabolic capacities is associated with differences in the activities of enzymes that influence flux through fatty acid oxidation and oxidative phosphorylation pathways, and with concerted changes in gene expression in these same pathways [14, 16]. For example, high-altitude mice that were acclimated to low-altitude conditions for 6 weeks exhibited wholesale upregulation of genes in the β-oxidation and oxidative phosphorylation pathways compared to their lowland counterparts that were acclimated to the same laboratory conditions [14]. Follow-up experiments on mice with different acclimation histories revealed that the regulatory changes associated with enhanced performance are highly plastic. Specifically, four of five transcriptional modules that were significantly associated with whole-organism thermogenic performance comprised a set of genes that exhibited significant effects of rearing environment (native elevation vs. common garden) that were independent of population-of-origin (highland vs. lowland), suggesting that a large fraction of the transcriptional variation is environmentally induced [16]. These environmentally sensitive transcriptional modules were enriched for genes involved in hematopoiesis, angiogenesis, muscle development, and immune response (Fig. 8.2). Interestingly, the single transcriptional module that exhibited a significant population-of-origin effect (expression differences persisted in the F1 generation) was enriched for genes involved in lipid oxidation, and this module was expressed at higher levels in high-altitude natives. Again, there was generally good correspondence between these transcriptional patterns and enzyme activities that serve as biomarkers for the flux capacity of the β-oxidation pathway and other core metabolic pathways, lending further support to the hypothesis that the enhanced aerobic performance of high-altitude deer mice under hypoxia is partly attributable to their enhanced capacities for lipid oxidation.

Fig. 8.2
figure 2

Functional enrichment of transcriptional modules that are associated with thermogenic performance in deer mice. Categorical enrichments are shown for five separate modules that are associated with thermogenic capacity (cold-induced VO2 max) (T21), thermogenic endurance (the length of time individuals can maintain >90 % of VO2 max (T5 and T14), or both measures of performance (P4 and T16). Modules T5, T14, T16, and T21 are comprised of genes that differ in expression across rearing environments independent of population of origin, whereas module P4 is comprised of genes differ in expression between highland and lowland mice across rearing environments. For each performance-associated transcriptional module, the proportional representation of genes in different gene ontology categories is compared between the transcriptional module (gray bars) and the genome as a whole (black bars). Asterisks denote gene ontology categories that are significantly enriched within the transcriptional module (*uncorrected p < 0.05, **FDR corrected q < 0.05). Reprinted from [16]

Taken together, these studies suggest that metabolic adaptation to high-altitude in deer mice involves the maintenance of a highly aerobic phenotype in the face of reduced oxygen availability, and this aerobic phenotype is achieved through both genetically based, constitutive differences in gene expression and transcriptional plasticity. Elite endurance athletes and highly aerobic nonhuman mammals are also characterized by an enhanced capacity for fatty acid oxidation under normoxic conditions [4, 8, 37]. Under conditions of chronic oxygen deprivation at high-altitude, a similar enhancement of fatty acid oxidation capacity could promote enhanced thermogenic performance, but it would require additional physiological changes to ensure adequate oxygen and fuel flux through oxidative pathways, suggesting that modifications of upstream steps in the oxygen-transport cascade may be necessary to support the increased lipid oxidation rate of high-altitude deer mice.

Indeed, the enhanced aerobic capacity of high-altitude deer mice under hypoxia is also partly attributable to increases in capillary density, oxidative fiber abundance, and oxidative enzyme activity in skeletal muscle [14, 16, 35]. These changes enhance the oxidative capacity and oxygen-diffusion capacity of working muscle, which could compensate for the diminished tissue oxygen supply under hypoxia [9, 51, 53, 77]. The transcriptional underpinnings of these phenotypic changes were identified in a common-garden experiment involving wild-caught high- and low-altitude mice as well as laboratory-reared F1 progeny of wild-caught mice [35, 52]. Expression analysis of a panel of genes that regulate angiogenesis and energy metabolism revealed that the increased capillarity and oxidative capacity of skeletal muscle in high-altitude mice was associated with an increased transcript abundance and protein abundance of peroxisome proliferator-activated receptor γ (PPARγ) (Fig. 8.3), a transcription factor that regulates mitochondrial biogenesis. PPARγ protein expression also increased during acclimation to chronic hypoxia [35]. Intriguingly, the underlying gene (pparg) exhibits a strong signature of recent positive selection in indigenous Tibetan and Mongolian highlanders [83].

Fig. 8.3
figure 3

Relative to low-altitude deer mice, high-altitude mice have higher expression of PPARγ protein (a) and mRNA (b) in the gastrocnemius muscle. Population-of-origin (i.e., high- or low-altitude) had significant main effects on PPARγ protein abundance and transcript abundance, but acclimation environment only had a significant main effect on protein abundance. There was no significant interaction between population-of-origin and acclimation treatment in either case [35]. The inset image in panel A shows representative immunoreactive bands for a high-altitude mouse (left) and low-altitude mouse (right) after acclimation to hypoxia. Modified from [35]

Expression patterns of the deer mouse pparg gene proved to be somewhat anomalous, however, as most other genes involved in angiogenesis were actually downregulated during acclimation to hypoxia [35] and were either not differentially expressed between hypoxia-acclimated high- and low-altitude mice or were downregulated in the high-altitude mice [52]. Transcriptomic analysis of gastrocnemius muscle revealed a small set of transcripts with genetically based expression differences between high- and low-altitude mice (i.e., nonplastic differences that persisted in the F1 progeny of mice with different populations-of-origin), and these transcripts clustered into two discrete modules. Some genes involved in regulating angiogenesis (Cadherin-7 and Notch-4) were significantly upregulated in high-altitude mice, but most genes within these modules were actually expressed at a lower level in high-altitude mice and expression levels were negatively correlated with measures of muscle capillarity and oxidative capacity. A possible explanation for this seemingly paradoxical result is that particular genes like Notch-4 may be responsible for maintaining the high muscle capillarity of high-altitude mice, and the associated increase in cellular oxygen tension simply dampens the stimulus for the expression of other genes that respond to oxygen limitation [35, 52]. As stated by Scott et al. (Ref. [52]: 1972): “A negative association between muscle phenotypes and expression could thus result for genes that do not cause the differences in muscle phenotype but are sensitive to its effects.” This highlights the difficulty of distinguishing the effects of genes that cause a given phenotype vs. effects that are simply consequences of the induced changes.

These integrative studies of deer mouse physiology revealed that genetically based regulatory changes and acclimation responses both contribute to improvement in whole-animal aerobic performance under hypoxia via changes in oxygen-transport capacity and oxygen utilization [1416, 35, 52, 76]. Thus, variation in whole-organism aerobic performance seems to stem from changes at several hierarchical levels of biological organization, and changes at each level appear to stem from concerted expression changes in co-regulated sets of genes.

6 Future Outlook

Whole-genome and transcriptome sequencing will continue to play a central role in studies of hypoxia adaptation in humans and other animals. Analyses of genomic/transcriptomic data and gene ontology databases can provide lists of candidate loci for hypoxia adaptation, but such analyses need to be integrated with experimental measures of whole-animal performance and other subordinate traits to obtain insights into actual mechanisms of physiological adaptation. Gene ontology analyses can be useful for generating hypotheses but, in the absence of experimental validation, in silico approaches lend themselves to overinterpretation and storytelling [44, 61]. Surveys of genomic/transcriptomic variation can suggest hypotheses about the adaptive significance of particular polymorphisms or expression changes, but functional experiments are required to test such hypotheses [17, 20, 50, 68].

In analyses of transcriptomic variation, it is also critical to use common-garden and/or reciprocal transplant experimental designs to disentangle genetic and environmental components of variation in gene expression, otherwise it is not possible to distinguish between evolved changes and plastic changes. Studies of high-altitude physiology in humans face obvious constraints with regard to experimental manipulations. However, recent work by Lorenzo et al. [34] provides an example of how reverse-genetics approaches and in vitro experiments can be used to follow up on results of population genomic studies to gain insights into mechanisms of hypoxia adaptation.