Introduction

“The ancients believed that the dark skin of some peoples is explained by the heat of the sun. Indeed, it is observed from experience that with increased proximity to the hot countries of the south, people become darker; on the other hand, if we travel up north, we find that men become increasingly fair-skinned, like the French, the Germans, the English and others. However, we can be certain that along the equator, some men are born almost white, as […] on the island of São Tomé. This island was colonized by the Portuguese, and after one hundred years, their children are still white […]. For these reasons, my lord Duarte thinks that the black color is not due to the action of the sun, but stems from the nature of sperm” (translated from Pigafetta and Lopes 2002).

Such were the views about skin color variation of the Portuguese trader Duarte Lopes, who narrated his adventures in the Kingdom of Congo to the Italian writer Filipo Pigafetta in 1591. Today, more than 400 years after Lopes’s and many other travelers’ accounts, the major questions about pigmentation variation remain basically the same: what is the genetic basis of pigmentation diversity, and how can the striking correlation between skin color and latitude be explained? Fortunately, while the questions have not changed, our means to answer them have substantially improved. In this article, I will review the implications of recent findings about the genes involved in normal skin color variation for understanding the evolutionary history of skin pigmentation. For a more comprehensive overview of the biology and evolution of different pigmentary traits, including also eye and hair color, the reader is referred, among others, to the excellent works of Rees (2003), Jablonski (2004), McEvoy et al. (2006), Parra (2007), Sturm (2009), Sturm and Duffy (2012), Liu et al. (2013), Deng and Xu (2018), and Quillen et al. (2019).

The Distribution of Human Skin Pigmentation: Patterns and Evolutionary Explanations

Skin color is a highly heritable complex trait that is determined by the type and total amount of melanin produced within melanocyte cells located in the basal layer of the epidermis (Byard 1981; Barsh 2003; Martin et al. 2017). Although the number of melanocytes is basically constant, individuals of varying skin color may differ in size and distribution of the melanin-producing melanosome organelles that transport melanin from melanocytes into surrounding keratinocyte cells (Barsh 2003; Parra 2007). These differences are the major cause for variation in the melanin content (Barsh 2003), which can be objectively quantified by measuring the amount of light reflected by the skin for different wavelengths, or by estimating a melanin index from reflectance of narrowband light in the red spectrum (Shriver and Parra 2000; Parra 2007).

Beyond the impressionistic observations of early travelers, several studies using objective measures of skin pigmentation have shown that skin color varies continuously across different geographic regions, being darker in tropical areas and becoming gradually lighter with increasing distance to the equator (Relethford 1997; Jablonski and Chaplin 2000) (Fig. 1). Due to this striking association, the proportion of skin color variation between geographic regions is unusually high when compared with most genetic markers and other quantitative traits (0.88 vs. 0.15) (Relethford 2002), suggesting that skin color variation was shaped by natural selection in response to environmental factors that are correlated with latitude.

Fig. 1
figure 1

Relationship between distance to the equator (absolute latitude) and skin color measured by the Melanin Index (r = 0.69; p < 0.0001). Data were retrieved from Martin et al. (2017)

Ultraviolet radiation (UVR) has often been considered to be the most important environmental selective pressure for skin color variation, due to its high correlation with latitude and because of observable differences in the risks and benefits of UVR exposure between individuals with lightly and darkly pigmented skin (Jablonski and Chaplin 2010, 2012). Skin pigmentation is more correlated with UVR measured from remote sense technology than with other environmental factors like precipitation, snow, frost-free days, or average temperature (Chaplin 2004). Moreover, UVR levels in autumn were found to be even more correlated with melanin reflectance than latitude, and UVR is strongly predictive of skin reflectance values on a global scale (Jablonski and Chaplin 2000; Chaplin 2004).

The most common harmful effects of UVR that can be counteracted by high concentrations of melanin include sunburn, skin cancer, and photodegradation of micronutrients (Branda and Eaton 1978; Rees and Flanagan 1999; Jablonski and Chaplin 2000; Borradale and Kimlin 2012; Greaves 2014). While it is not clear whether sunburn and skin cancer cause enough differential mortality before reproductive age, photodegradation of folate—a compound of vitamin B—might have a noticeable impact on reproductive ability (Blum 1961; Rees and Flanagan 1999; Jablonski and Chaplin 2000; but see Greaves 2014 for an alternative view on the importance of skin cancer as a selective pressure). This hypothesis seems to be supported by studies showing that folate can be subjected to photolysis at high levels of UVR, and that folate deficiency is associated with neural tube defects and increased incidence of male infertility (Branda and Eaton 1978; Jablonski and Chaplin 2000; Borradale and Kimlin 2012). As UVR declines with increased distance to the equator its damaging effects become less important and the major selective pressure influencing skin color is thought to be the lack of sufficient radiation to stimulate the cutaneous synthesis of vitamin D. In regions far away from the equator, dark skin would be disadvantageous since melanin can block UVR stimulation, increasing the risk for conditions like rickets and osteomalacia that are caused by lack of adequate levels of vitamin D (Loomis 1967; Holick 1995; Jablonski and Chaplin 2000, 2010; Chaplin and Jablonski 2009; but see Robins 2009 and Elias and Williams 2016 for alternative views on the importance of vitamin D for the evolution of light skin color).

According to the framework outlined above, the first UVR-driven shift in skin pigmentation in the human lineage occurred when the white, chimpanzee-like skin of our early hominin ancestors became dark to protect the naked skin from the damaging effects of excess radiation after body hair was lost to enhance thermoregulation in tropical savannas (Wheeler 1985; Jablonski 2004; Jablonski and Chaplin 2017). Consideration of the paleo-ecological and anatomical evidence during the early stages of human evolution has led to the proposal that most thermoregulatory changes, including loss of thick body hair, might have occurred about 2 million years ago, as the earliest members of the genus Homo became mostly savanna dwellers, increasing their levels of activity and exposure to daylight (Klein 1999; Rogers et al. 2004; Jablonski 2006). If this hypothesis is accepted, a 2-million years date can be set as the upper limit for dark skin becoming the ancestral state of later Homo species. However, other estimates based on the divergence times of human and primate lice species suggest that the loss of body hair may be as old as 3.3 million years, raising the possibility of an even earlier change from light to dark pigmentation in the human lineage (Reed et al. 2007; Stoneking 2017). As later Homo groups occupied Eurasia, expanding their range to regions with low UVR, natural selection could have favored genetic variants associated with skin lightening to facilitate vitamin D synthesis (Lalueza-Fox et al. 2007).

In modern humans, the available phenotype information can be combined with current knowledge about major migration routes to derive general models of skin color change (Nielsen et al. 2017). In the simplest scenario, populations migrating out of Africa around 55–65 kya resembled their African ancestors and were dark-skinned. As the ancestors of Europeans and north Asians migrated northwards, they became exposed to the risks of low vitamin D synthesis and light skin was favored by selection. Populations migrating along southern Asia, eventually reaching Australia and Melanesia, remained close to the equator and preserved the ancestral dark-skinned condition or developed novel genetic adaptations as a protection against the damaging effects of UVR (Norton et al. 2007).

Exceptions to these trends can be caused by a number of factors, like insufficient time for selective pressures to act due to recent migration, differences in cultural practices influencing exposure to the sun, availability of vitamin D in the diet, and barriers to gene flow limiting the spread of favorable variants (Quillen et al. 2019). For example, the lighter skin of indigenous populations of the New World compared to peoples from the Old World living at the same latitude is likely to reflect both cultural adaptations and the relatively recent occupation of the Americas by peoples of Asian descent (Jablonski 2006; Jablonski and Chaplin 2017). Cultural practices may also have influenced the evolution of skin pigmentation in the Tibetan plateau, where local populations have lighter pigmentation than predicted by the UVR levels of the region, probably because of the need to wear clothes (Jablonski and Chaplin 2000). Moreover, mate choice based on skin color might have shaped local patterns of skin color variation, altering the expected correlation between melanin content and intensity of UVR (Iliescu et al. 2018).

In spite of these complications, the general framework linking human migrations with natural selection has the advantage of providing broad expectations that can be assessed with the increasingly available information about genes contributing to skin color variation.

The Genetic Basis of Skin Pigmentation

Several candidate genes with a role in human skin color have been identified through studies of model organisms or in humans affected by pigmentation disorders, like albinism. Since not all genes involved in pigmentation pathways determine normal pigmentation variability, some studies additionally used molecular signatures of positive selection as a complementary tool to narrow down the search for genomic regions of interest (McEvoy et al. 2006; Lao et al. 2007; Myles et al. 2007). This approach mostly relies on the assumption that selective events could be modeled as hard sweeps in which new favored alleles rise rapidly in frequency, reducing haplotype diversity, and creating extreme allele frequency differences between populations (Pritchard and Di Rienzo 2010). However, since these methods are phenotype-independent, further analyses are necessary to effectively link candidate genes to phenotypic variation (Jeong and Di Rienzo 2014).

The most common way to show that a gene influences normal skin color diversity is to perform association studies to assess how melanin levels vary between individuals with different genotypes, using other markers to control for the effects of population structure. Association studies can be performed with candidate genes pre-selected on the basis of their known involvement in pigmentation pathways, or use hundreds of thousands of polymorphic markers dispersed across the genome (GWAS—Genome-Wide Association studies) (Quillen et al. 2019). Some of the genes identified by these studies are listed in Table 1. Interestingly, several genes identified by GWAS do not play an obvious role in pigmentation variation and would have remained undetected in candidate gene approaches (e.g., DDB1). These genes raise the possibility that skin color variation may be partially driven by pleiotropic effects. Moreover, many causal variants are located in non-coding regions and are likely to act through regulation of gene expression (e.g., rs12913832 in HERC2 and rs12203592 in IRF4).

Table 1 Genes and polymorphisms associated with skin color variation

The discovery of the role of the SLC24A5 gene in human skin color variation provides a good example of an association approach based on a candidate gene. The gene encodes a cation exchanger with an important function in melanosome biogenesis and melanin synthesis, and was initially found to be involved in pigment variation in zebrafish. Subsequent comparative analyses found that the human SLC24A5 homologue harbored several Single-Nucleotide Polymorphisms (SNPs) with very different allele frequencies between Africans and Europeans, including a non-synonymous polymorphism at rs1426654 (Ala111Thr). Finally, the involvement of SLC24A5 in human skin color variation was validated by association studies showing that rs1426654 genotypes were significantly correlated with melanin levels in admixed Afro-Americans and Afro-Caribbeans (Lamason 2005). Several other association studies, including GWAS, have firmly established the role of SLC24A5 as one of the main loci influencing human skin color variation.

The KITLG gene, which encodes a signaling molecule important for melanocyte migration, illustrates a less straightforward situation. Like SLC24A5, the involvement of KITLG in pigmentation diversity was first suggested by studies in a model organism (the stickleback fish) (Miller et al. 2007). Moreover, allelic variation in potentially regulatory regions of the human homologue gene was found to be significantly associated with skin color in African Americans (Miller et al. 2007). However, this association could not be replicated in a GWAS study of the African European admixed population of Cape Verde (Beleza et al. 2013a). More recently, an association between skin color and KITLG was found to be marginally significant in the southern Africa Nama and ǂKhomani Khoisan, who have lighter skin than their Bantu-speaking neighbors, although no genome-wide significance could be replicated (Martin et al. 2017).

Discrepancies between studies may be due to false-positive associations caused by inefficient correction of population structure, differences in power to detect small effects on the phenotype, or differences in patterns and levels of linkage disequilibrium when the marker SNP is not causal. In addition, epistatic interactions among variants from different loci can modify the effect size of a casual variant mutation in different populations (Quillen et al. 2019). These complications emphasize the need for developing and using standardized methods to validate discovered associations through functional assays (Visser et al. 2012, 2014; Tsetskhladze et al. 2012; Praetorius et al. 2013; Crawford et al. 2017).

Until recently, most genetic variants associated with pigmentation diversity were identified among Europeans or African European admixed populations (Quillen et al. 2019). Studies in admixed populations typically identify genes of high effect displaying elevated allele frequency differences between parental populations, while studies on Europeans focus on the lighter portion of the human skin pigmentation range and detect genes with smaller effects. However, more recent studies have started to correct these biases by focusing on understudied regions like Africa, where the greatest range of pigmentation is observed (Crawford et al. 2017; Martin et al. 2017). These studies are providing an increasingly sharp picture of the genetic basis for skin pigmentation diversity.

In a GWAS study performed in the admixed population of Cape Verde, Beleza et al. (2013a, b) estimated that four loci (SLC24A5, SLC45A2, GMR5-TYR, and APAB2) explain 35% of the skin color variation, suggesting a genetic architecture in which the moderate effects of a few genes are combined with small effects of many genes. A recent study focusing on skin color variation across eastern and southern African populations (Crawford et al. 2017) showed that 29% of pigmentation diversity in these populations can be attributed to variation in four genomic regions (SLC24A5, MFSD12, DDB1/TMEM138, OCA2/HERC2). Still in Africa, Martin et al. (2017) found that only 23% of the phenotypic variance in the Nama and ǂKhomani Khoisan could be explained by the 50 variants most significantly associated with skin pigmentation in the two populations, showing that the complexity of skin color itself may vary between populations. Together, these studies indicate that the genetic architecture of skin color, in spite of being simpler than traits like body height (Wood et al. 2014), is more polygenic than other pigmentary traits like eye color, where a single locus (HERC2/OCA) may explain at least 30% of the observed variation (Sturm et al. 2008; Eiberg et al. 2008; Lloyd-Jones et al. 2017).

The complexity of skin color is also illustrated by the observation that similar phenotypes can be determined by different sets of variants and/or genes in different populations (convergence). For example, the genes SCLC24A5 and SLC45A2 have derived alleles associated with lighter skin that are almost fixed in Europeans but are virtually absent in East Asians, who instead share high frequencies of the ancestral allele with dark-skinned Africans and Australo-Melanesians (McEvoy et al. 2006; Norton et al. 2007). On the other hand, the derived allele of a non-synonymous polymorphism (His615Arg; rs1800414) located in the OCA2 gene is significantly associated with skin lightening in East Asians but is absent in Europeans (Edwards et al. 2010; Yang et al. 2016). More recently, Adhikari et al. (2019) have shown that the derived allele in a non-synonymous SNP (Tyr182His; rs2240751) in the MFSD12 gene is associated with lighter skin pigmentation in a large sample of Latin Americans with high Native American ancestry. Although other variants of MFSD12 had been previously related with skin color variation in Africans (see above; Table 1), the derived allele at rs2240751 was found to be common only in East Asians and Latin Americans, suggesting that it started to rise in frequency in East Asia and was carried into the Americas by migration (Adhikari et al. 2019). These findings suggest that, at least in part, different genes and variants are responsible for decreased melanin content in Europe and East Asia, emphasizing the importance of studying non-European populations to understand the full complexity of skin color variation.

The complexity of skin color also has implications for the accuracy of models predicting phenotypes using a small number of DNA variants in forensic contexts or in ancient DNA studies (Lazaridis et al. 2014; Liu et al. 2015; Walsh et al. 2017; Brace et al. 2019). In fact, while several predictive tests have been developed to assign categorical pigmentation phenotypes with increasing accuracy, SNPs identified in one particular continental region may have no power to predict phenotypes in other regions (Martin et al. 2017; Walsh et al. 2017; Quillen et al. 2019).

Evolution of Skin Pigmentation

The identification of key genes underlying pigmentation diversity made it possible to assess the evolutionary history of skin pigmentation in modern humans by studying the geographical distribution, frequency and age of different variants, and by investigating molecular signatures of natural selection. In this context, it is important to distinguish ancestral and derived skin color phenotypes from ancestral and derived alleles in skin pigmentation loci. As mentioned above, early Homo ancestors are believed to have become more darkly pigmented than the chimpanzee as a protection against the damaging effects of UVR after body hair was lost. According to this hypothesis, dark skin is a derived trait and derived alleles are expected to be associated with dark skin at loci that were involved in increasing melanin content relative to the chimpanzee. However, derived alleles can also be associated with lower melanin content at loci that were not involved in human/chimpanzee skin color differentiation. This is the case of several alleles associated with lighter skin in modern human populations that migrated out of Africa.

One of the first genes used to infer the evolutionary history of skin pigmentation was MC1R (Table 1). Based on the contrast between lack of sequence variation in Africans and abundance of non-synonymous variants in Europeans, Harding et al. (2000) concluded that MC1R was under strong purifying selection for maintaining dark skin in Africa. This observation has often led to the view that most genes involved in skin color diversity would be under functional constraint in dark-skinned populations (Quillen et al. 2019). However, recent studies focusing on skin color variation in African populations provided evidence against this generalization (Crawford et al. 2017; Martin et al. 2017). By analyzing skin color diversity in populations from eastern and southern Africa, Crawford et al. (2017) found that none of the eight SNPs most strongly associated with pigmentation variation displayed allelic fixation. Even among eastern African Nilo-Saharan speakers, who display the darkest pigmentation, the average frequency of alleles associated with light skin is higher than 25% (Fig. 2a). Moreover, melanin content is positively correlated (r = 0.96; p = 0.003) with average heterozygosity at those eight SNPS in a small sample of populations encompassing regions with very different levels of UVR, presented here for illustration purposes (Fig. 2b–d). These findings are consistent with the observation that dark-skinned populations have wider distributions of melanin content than light-skinned populations (Martin et al. 2017), and suggest that selection for dark skin involved small shifts in allele frequency from standing variation as expected in polygenic adaptation (Pritchard and Di Rienzo 2010; Pritchard et al. 2010; Berg and Coop 2014; Field et al. 2016). A clear indication of the role of standing variation is provided by estimated allele ages. According to Crawford et al. (2017), seven out of the eight SNPs most involved in skin color diversity in their sample have derived allele ages ranging from 250,000 to 1,200,000 years, showing that allelic variation in these polymorphisms originated much before the emergence of modern humans.

Fig. 2
figure 2

a Average frequencies of alleles associated with light skin color in regions with different levels of UVR, calculated from eight SNPs most strongly associated with pigmentation variation in African populations: rs1800404, rs4932620, rs7948623, rs10424065, rs6510760, rs6497271, rs1426654, rs11230664. b Average heterozygosities calculated from the same SNPs and geographical regions displayed in a. c Average Melanin Indices for the regions displayed in a and b. d Correlation between average heterozygosity and Melanin Index (r = 0.96; p = 0.003). Data were retrieved from Martin et al. (2017) and Crawford et al. (2017)

The study of African populations has also important implications for a better understanding of global variation in skin color. So far, several genes involved in skin color lightening in Eurasians were found to bear strong signals of recent positive selection acting on new alleles (hard sweeps) (Voight et al. 2005; McEvoy et al. 2006; Lao et al. 2007; Myles et al. 2007; Pickrell et al. 2009). In genes like KITLG (rs642742), the selective sweep probably started around 30,000 ago (Beleza et al. 2013b; Chen et al. 2015; Smith et al. 2018; Yang et al. 2018), before the divergence of Europeans and East Asians and after modern humans left Africa (Nielsen et al. 2017). Consequently, high allele frequency differences between Africans and non-Africans are now observed (Miller et al. 2007; Pickrell et al. 2009). Date estimates for the onset of selection on other genes, like OCA2 (rs1800414), SLC24A5 (rs1426654), SLC45A2 (rs16891982), and MFSD12 (rs2240751) fall within the past 11,000–19,000 years, much after the first migrations into Eurasia (Beleza et al. 2013b; Yang et al. 2016), though older coalescent ages around a 28–31,000 years date for SLC24A5 have been recently calculated (Basu Mallick et al. 2013; Crawford et al. 2017). Moreover, as alleles associated with light pigmentation in these genes have non-overlapping distributions in Eurasia (see above), it is likely that evolution of light skin pigmentation in the two regions was partially independent and still ongoing until recently (McEvoy et al. 2006; Norton et al. 2007). This scenario is further supported by genotype data from ancient genomes showing that light skin variants in SLC24A5 and SLC45A2 genes were not fixed in western Europe as recently as 8000 years ago, and that dark and intermediate skin colors might have been common by that time (Wilde et al. 2014; Mathieson et al. 2015; Brace et al. 2019). Although these findings suggest that skin lightening in Eurasia mostly evolved through strong selection of relatively recent mutations, it is now clear that old variants associated with lighter skin in Africa have also played an important role in non-African populations. These variants reach high frequencies among Europeans and East Asians, indicating that migration out of Africa was accompanied by allele frequency shifts from African standing variation (Fig. 1 from Crawford et al. 2017; Fig. 2a). Furthermore, several old alleles associated with light skin have very similar frequencies in Europeans and southern African Khoisan, suggesting that, as expected in polygenic adaptation (Hancock et al. 2010; Pritchard and Di Rienzo 2010), parallel frequency shifts from standing variation occurred in both populations to respond to selective pressures associated with absolute distance from the equator (Fig. 3a). A similar, though less pronounced, parallelism is found in non-Khoisan Africans and Australo-Melanesians, who share higher frequencies of variants associated with darker skin, probably because these variants were not lost by populations that remained close to the equator after leaving Africa (Crawford et al. 2017).

Fig. 3
figure 3

Pairwise absolute allele frequency differences in eight SNPs most strongly associated with pigmentation variation in African populations (as in Fig. 2). a Differences between Khoisan and Europeans. b Differences between East Africans and Europeans. The dashed lines show the average allele frequency differences. Data were retrieved from Crawford et al. (2017)

Taken together, these results suggest that the shift to light skin in Eurasia involved a combination of polygenic adaptation with hard selective sweeps. Since adaptation through hard sweeps is expected to predominate when genetic variance is reduced and populations are still far from the phenotypic optimum (Chevin and Hospital 2008), it is conceivable that hard selective sweeps occurred because standing variation in skin color genes was lost in Eurasian populations. Moreover, it is possible that this relative simplification of genetic architecture was due to the joint effects of selection favoring “African” alleles associated with light skin and bottlenecks occurring during the out of Africa migration, as expected from the connection between selection and demographic processes that is typically observed in human populations (Coop et al. 2009).

A further example for the combination of migration and selection is provided by the distribution of the SLC24A5 allele associated with light skin pigmentation. Besides being virtually fixed in Europe, this variant is also associated with lighter skin color in Africans, reaching high frequencies (28–50%) in Afro-Asiatic speakers from East Africa, as well as in several Khoisan peoples from southern Africa, especially among the Nama and the ǂKhomani (33–53%) (Pagani et al. 2012; Crawford et al. 2017; Martin et al. 2017; Lin et al. 2018). Taking into account the population history of these regions, it is likely that the SLC24A5 variant entered East Africa from the Middle East around 3000–9000 years ago (Pagani et al. 2012; Crawford et al. 2017) and was then carried by migrating eastern African pastoralists into southern Africa around 2000 years ago, where it was favored by natural selection (Lin et al. 2018). The latter dispersion is remarkably similar to the spread of the − 14,010*C lactase-persistence allele from eastern into southern Africa (Breton et al. 2014; Macholdt et al. 2014; Pinto et al. 2016, Lin et al. 2018).

Conclusion

The remarkable progresses in the understanding of normal skin color variation include not only the identification of an increasing number of genes associated with pigmentation diversity, but also a better perception of the complexity of this trait. It seems now clear that the genetic basis of skin color is less simple than previously thought and that the geographic variation in skin pigmentation is not exclusively driven by hard selective sweeps in a few key genes. The recent increase in the number of populations studied for pigmentation variation, including African groups from a wide range of geographic origins, has revealed that the complexity of skin color can vary across populations and that the evolutionary history of pigmentation involved adaptations achieved by the concerted action of different types of selection (Fig. 4). While this increasing complexity does not challenge the general view that human migration towards regions of lower UVR levels favored lighter skin colors, future studies on fitness differences between individuals with different pigmentation will be needed to investigate the additional impact of epistasis, pleiotropic effects, and cultural practices in shaping the geographical patterning of skin color variation.

Fig. 4
figure 4

Allele frequencies at 5 genes/SNPs associated with skin color variation exemplifying the variety of selection patterns underlying skin pigmentation differences across populations: parallel shifts in allele frequencies from standing variation (HERC2/rs6497271); selective sweeps in proto-Eurasians (KITLG/rs642742); selection on standing variation in proto-Eurasians (DDB1/rs11230664); selective sweeps in Europeans (SLC24A5/rs1426654), and selective sweeps in Asians (OCA2/rs1800414). a = Botswana Khoisan; b = West Africans; c = East Africans; d = Australo-Melanesians; e = East Asians; f = Europeans. Alleles associated with darker skin are shown in black