figure b

Introduction

Prevalence of type 2 diabetes varies among human continental ancestry groups, as does obesity, which is a strong risk factor for diabetes. In the USA, prevalence of diabetes and obesity is particularly high in American Indians, whereas prevalence is low in European Americans and intermediate in African Americans [1, 2]. Both type 2 diabetes and obesity are highly heritable [3, 4], and several hypotheses have invoked differences in natural selection across ancestry groups to explain differences in prevalence [5,6,7,8,9,10,11,12,13]. Recent genome-wide association studies have identified many variants reproducibly associated with both type 2 diabetes and obesity [14, 15]. Several investigators have analysed these established susceptibility loci for evidence of natural selection. Such studies have generally involved assessment of genetic signatures of recent selection or comparison of allele frequencies among ancestry groups [16,17,18,19,20,21,22]. Results of these studies are largely equivocal; however, both approaches are limited in their ability to detect selection on polygenic traits. An alternative approach involves comparison of genetic components of variance for the trait, among and within ancestry groups, with corresponding genotypic variance components across representative genomic markers [23, 24]. These variance components methods are well suited for detection of polygenic selection that differs in magnitude across groups, and they do not require knowledge of specific susceptibility variants. They do require comparably measured phenotypic data, along with genotypic data, across diverse ancestry groups. In the present study, we compare phenotypic divergence for type 2 diabetes and related traits with genotypic divergence in a cohort that includes African American, American Indian and European American individuals who had undergone whole exome sequencing.

Methods

Participants and measures

Participants were derived from a multiethnic study, conducted in urban Phoenix, Arizona, designed to identify determinants of diabetes and related traits; the methods have been previously described [25]. In brief, individuals were ≥18 years old, of any ethnicity, and participants were not recruited based on diabetes or other conditions. A large proportion of participants were American Indian, primarily from tribes of the Southwestern United States. The study was approved by the institutional review boards of the National Institute of Diabetes and Digestive and Kidney Diseases and the Phoenix Area Indian Health Service, and all participants gave informed consent. The present cross-sectional sample was derived from 1389 participants, examined in 2011–2016, who had relevant phenotypic data available, and data from whole exome sequencing. After exclusion of ten individuals who did not cluster with their primary self-reported ancestry group in principal components analyses, there were 88 individuals who were full-heritage African American by self-report, 761 individuals who were full-heritage American Indian and 129 individuals who were full-heritage European American. Since some analyses may be influenced by presence of closely related individuals, genetic relatedness was calculated between pairs of individuals using PREST (version 3.02) [26], and a set of ‘unrelated’ individuals was selected by randomly excluding one member of each pair in whom the observed proportion of alleles shared identical by descent was >0.14. (This excludes individuals who are second-degree relatives or closer to another individual in the sample.) This resulted in 83 African Americans, 523 American Indians and 128 European Americans. Characteristics of individuals are shown in electronic supplementary material (ESM) Table 1, and a principal components plot is shown in ESM Fig. 1.

Fasting plasma glucose and HbA1c were measured, and a 75 g oral glucose tolerance test was administered to those without a previous diagnosis of diabetes, with glucose concentrations measured 30 min and 2 h after the oral glucose load. Individuals were classified as having diabetes if they had a previous diagnosis by self-report, fasting plasma glucose ≥7.0 mmol/l, 2 h plasma glucose ≥11.1 mmol/l or HbA1c ≥ 6.5% (48 mmol/l) [27]. Serum insulin concentrations were measured by immunoassay (Tosoh Bioscience, Tokyo, Japan); fasting serum insulin and 30 min serum insulin, adjusted for 30 min glucose level, were taken as measures of insulin resistance and insulin secretion, respectively. Analyses of insulin measures were restricted to those with normal glucose tolerance (nondiabetic, and 2 h glucose <7.8 mmol/l), constituting 59 African Americans, 281 American Indians and 94 European Americans. Height and weight were measured for calculation of BMI. The maximum weight and contemporaneous height were also obtained by self-report. Analyses of BMI are generally shown based on self-reported maximum weight, as this was more strongly associated with diabetes.

Genotypes

Whole exome sequencing in DNA derived from peripheral blood was conducted at Regeneron Genetics Center, as previously described [28, 29]. Sequencing was conducted using a Hi-Seq 2500 sequencer (Illumina, San Diego, CA, USA). Sequencing was part of a larger project involving 8137 individuals, 43 of whom were excluded for low-quality sequence data. In 98% of samples, at least 90% of the exome achieved at least 20x coverage. Analysis was restricted to variants with <10% missing genotype calls, that were within Hardy–Weinberg equilibrium (p > 0.0001 in full-heritage American Indians), that had concordance rates >97.5% in 100 duplicate samples and for which average minor allele frequency across ancestry groups was ≥5%. This resulted in 97,388 autosomal markers. Missing genotypes were imputed from phased haplotypic data in each ancestry group using BEAGLE (version 3.2.2) [30].

Measures of divergence

The coancestry coefficient (FST) was calculated as a measure of genotypic divergence among ancestry groups. FST represents the proportion of variance in allele frequency in the total population that is explained by group membership. For each marker, we calculated FST by the method of moments [31, 32]. Across r ancestry groups, the mean squares among and within groups for a given allele u are, respectively:

$$ MSA=\frac{1}{r-1}{\sum}_{i=1}^r2{n}_i{\left({p}_{iu}-{\overline{p}}_u\right)}^2 $$
$$ MSW=\frac{1}{\sum_{i=1}^r\left(2{n}_i-1\right)}{\sum}_{i=1}^r2{n}_i{p}_{iu}\left(1-{p}_{iu}\right) $$

where 2ni is the total number of alleles measured in the ith group (twice the number of individuals), piu is the frequency of allele u in the ith group and \( {\overline{p}}_u \) is the mean frequency of the u allele across groups. FST for a single marker with m alleles is:

$$ {F}_{ST-M}=\frac{\sum_{u=1}^m\left({MSA}_u-{MSW}_u\right)}{\sum_{u=1}^m\left[{MSA}_u+\left(2{n}_c-1\right){MSW}_u\right]} $$

where \( 2{n}_c=\frac{1}{\left(r-1\right)}{\sum}_{i=1}^r2{n}_{ic} \), and \( 2{n}_{ic}=2{n}_i-4{n}_i^2/{\sum}_{i=1}^r2{n}_i \). The mean value of FST-M over all markers was taken as the overall FST. This mean marker-wise FST is comparable to the phenotypic divergence measures described below and, thus, represents the expected value under neutrality [24, 33]. However, it tends to modestly underestimate the evolutionary distance, so we also report FST calculated by the ‘ratio of averages’ method, which provides a better estimate of this distance [34]. FST calculated from exome sequence data is generally comparable to that calculated from whole genome sequence data [35].

The quantitative genetic divergence index (QST) is a measure of phenotypic divergence that is analogous to FST [23, 24]. For diploid organisms QST is calculated as:

$$ {Q}_{ST=}\frac{\sigma_{Ga}^2}{\sigma_{Ga}^2+2{\sigma}_{Gw}^2} $$

where \( {\upsigma}_{\mathrm{Ga}}^2 \) is the variance among ancestry groups attributable to additive genetic effects and \( {\upsigma}_{\mathrm{Gw}}^2 \) is the genetic variance within groups. Under evolutionary neutrality, the expectation is that QST = FST, whereas with diversifying selection (when differences in direction or magnitude of natural selection across groups drive phenotypic divergence), the expectation is QST > FST [24, 33]. With stabilising selection (when selection is of similar direction and magnitude across groups), then QST < FST. Variance components for calculation of QST are typically estimated by ‘common garden’ controlled breeding experiments.

In humans and other natural populations where controlled breeding experiments are not feasible, QST can be approximated by the phenotypic divergence index (PST). This uses the total phenotypic components of variance among and within ancestry groups, \( {\upsigma}_{Pa}^2 \) and \( {\upsigma}_{Pw}^2 \), respectively, rather than the genetic variance components. A general formula for PST is:

$$ {P}_{ST=}\frac{{c\upsigma}_{Pa}^2}{{c\upsigma}_{Pa}^2+2{h}^2{\upsigma}_{Pw}^2} $$
(1)

where h2 represents the proportion of the within-group phenotypic variance due to additive genetic effects (i.e., heritability) and c represents the proportion of the among-group variance due to genetic factors [36]. When h2 and c are known from representative populations, then PST, calculated from eq. 1, is an unbiased estimate of QST.

In the present study we estimate h2 in genetically related individuals (i.e., in pedigree data without exclusion of close relatives), but c is unknown, as is often the case. In this situation, there are two widely used formulae for PST, which make different assumptions about c. The formula of Leinonen et al. is [37]:

$$ {P}_{ST=}\frac{\upsigma_{Pa}^2}{\upsigma_{Pa}^2+2{h}^2{\upsigma}_{Pw}^2} $$
(2)

This assumes that c = 1, i.e., that all phenotypic differences among ancestry groups are due to genetic factors, and this estimate represents the maximum possible value of QST for a given h2. This can be justified by the notion that PST is a screen for identifying traits potentially under differential natural selection. Other investigators, however, consider it more prudent to assume that c = h2 [38], and this leads to:

$$ {P}_{ST=}\frac{\upsigma_{Pa}^2}{\upsigma_{Pa}^2+2{\upsigma}_{Pw}^2} $$
(3)

This is more stringent in that it gives lower values of PST than eq. 2 (unless h2 = 1). We calculate PST under both equations, and we present analyses under eq. 2 as the primary results with the recognition that these represent maximal estimates of PST. Results calculated under eq. 3 are presented in ESM Tables 5 and 8, and we conduct sensitivity analyses across a range of values for h2 and c (including situations with c < h2) to evaluate effects on the conclusions.

Statistical analyses

Analyses were conducted in SAS (version 9.4; SAS Institute, Cary, NC, USA). Kernel density estimation (PROC KDE in SAS) was used to estimate nonparametric density functions. Phenotypic differences for continuous traits among ancestry groups were assessed using linear regression models with control for age and sex (and 30 min glucose, for analyses of 30 min insulin). A logistic regression model was used for analyses of diabetes. Heritability was assessed in the 761 American Indian participants (without exclusion of close relatives) using a linear mixed model. The total phenotypic variance was modelled as:

$$ {\upsigma}_P^2={\Phi \upsigma}_G^2+\mathrm{I} {\upsigma}_E^2 $$

where \( {\upsigma}_G^2 \) is the variance potentially attributable to genetic factors, \( {\upsigma}_E^2 \) is the variance attributable to individual-level environmental factors, Φ is a matrix of the proportion of alleles shared identical by descent between pairs of individuals (estimated by PREST [26]) and І is an identity matrix. Heritability was calculated as \( {h}^2={\upsigma}_G^2 \)/\( {\upsigma}_P^2 \). These analyses were conducted in SOLAR (version 8.1.1) with adjustment for age, sex and the first genetic principal component in American Indians (to account for potential population stratification), and a probit model was used to analyse liability to diabetes [39]. Confidence intervals were calculated with a likelihood-based method [40].

For continuous traits, variance components for calculation of PST were taken from the mean squares among and within ancestry groups, derived from the regression model with ancestry group as a fixed effect [38, 41]. Thus,

$$ {\upsigma}_{Pw}^2= MSW= SSE/\left[{\sum}_{i=1}^r{n}_i-\left(r-1\right)-1\right] $$
$$ {\upsigma}_{Pa}^2=\frac{MSA- MSW}{n_0} $$

where SSE is the sums of squares error from the regression, and ni is the number of individuals in the ith ancestry group; if μi is the trait mean in the ith ancestry group, predicted by the regression model, and \( \overline{\upmu} \)is the total sample mean:

$$ {n}_0=\frac{1}{\left(r-1\right)}\ \left({\sum}_{i=1}^r{n}_i-\frac{\sum_{i=1}^r{n}_i^2}{\sum_{i=1}^r{n}_i}\right) $$
$$ MSA=\frac{1}{\left(r-1\right)}{\sum}_{i=1}^r{n}_i{\left({\upmu}_i-\overline{\upmu}\right)}^2 $$

For liability to diabetes, parameters were inferred from estimates of variance components derived from a probit mixed model in which ancestry group was a random effect (fit with PROC GLIMMIX in SAS). If \( {s}_{grp}^2 \) is the variance attributed to ancestry group and \( {s}_{rsd}^2 \) is the residual variance (from the Pearson χ2 fit of the model), then \( SSE={s}_{rsd}^2{\sum}_{i=1}^r{n}_i \) and \( MSA=\frac{s_{grp}^2}{\left(r-1\right)}{\sum}_{i=1}^r{n}_i \). Similar approaches have been used elsewhere [42].

Parameters were estimated using a bootstrap procedure with 1000 iterations; 90% CIs were generated from centiles of the bootstrap distribution. Estimates of FST and PST depend on sample sizes, and their interpretation as measures of population divergence is most straightforward if sample sizes are equal in each ancestry group [32]. Therefore, in each bootstrap iteration, a sample size equal to that of the ancestry group with the smallest sample size was selected for each group. Although the mean marker-wise FST represents the expectation of PST under neutrality, there is substantial biological variability; despite polygenicity, the distribution is approximately that of FST for individual markers [24, 33]. Thus, comparison of PST with FST is most appropriately made across the single-marker distribution. Following Guo et al. [43], we generated this comparison from the distribution of FST-M shown in ESM Fig. 2. The proportion of markers for which the difference with the mean (FST-M − FST) was greater than the observed value of PST − FST was taken as the empirical one-sided p value for the null hypothesis PST = FST against the alternative PST > FST. The distribution of FST-M had a thick lower tail, with ~5% of markers having a value <0.01; thus, we did not test for the alternative PST < FST. Diversifying selection may act primarily on one ancestry group, and, in this situation, it may be most powerful to consider the maximum divergence of PST from FST across all pairwise comparisons; we report this value as PST-MAX (and assess its p value with correction for three pairwise comparisons). We also conduct a multivariable test of the null hypothesis of PST = FST across any of the five major traits (diabetes, BMI, height, fasting insulin, 30 min insulin) using a method that combines p values with allowance for the correlations between traits [44].

FST outlier analyses

For primary analyses, we used the ‘robust’ approach of comparing PST with FST taken across all available markers [45]. However, some of the markers themselves may have been subject to natural selection, and, to examine their potential influence, we conducted Bayesian outlier analysis of the FST-M distribution using BAYESCAN2 (version 2.01) [46]. This models the allele frequency differences among groups as a function of a population-specific FST component and a locus-specific FST component, subject to selection. Markers for which the locus-specific component is necessary are candidates for being under natural selection. As results depend on the specified prior odds of selection vs neutrality, we varied this parameter over a range of values and designated markers for which posterior odds were >1:1 as potentially under selection. Then, p values were calculated using the remaining, putatively neutral, markers.

Genetic admixture estimates

We used ADMIXTURE (version 1.3.0) to obtain estimates of genetic admixture proportions for each individual, assuming three ancestral populations [47]. In these analyses we included data from individuals in the 1000 Genomes project to improve resolution (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/); data from the HapMap Yoruba in Ibadan, Nigeria, population and the Centre d’Etude du Polymorphism Humain Utah population were used as representative of African and European ancestry groups, respectively. To reduce the influence of linkage disequilibrium, we selected markers ~100 kb apart, after exclusion of markers that did not have consistent reference and alternative alleles between our exome sequence data and the 1000 Genomes data (and excluding A/T and C/G polymorphisms); this resulted in 14,672 markers.

Additional population data

To assess applicability of our results in more general population data, we obtained data for individuals ≥18 years of age from a population-based study from a high-risk southwest American Indian (SWAI) population (4032 full-heritage American Indians) [48], and from the oral glucose tolerance subset of the 2005–2010 National Health and Nutrition Examination Survey (NHANES) (https://wwwn.cdc.gov/nchs/nhanes/Default.aspx), which is representative of the general US population. This included 1271 individuals of non-Hispanic black ancestry, taken as representative of African Americans, and 2905 individuals of non-Hispanic white ancestry, taken as representative of European Americans. We calculated PST across these three populations for diabetes and BMI. In these analyses, diabetes was diagnosed based on self-report, fasting plasma glucose or 2 h plasma glucose (as HbA1c was not available in all participants). Exome sequencing data were available from 3435 SWAI participants, but, as such data were not available for NHANES participants, we could not directly compare PST with FST. For an indirect comparison, we used genotypic data from the 1000 Genomes project, including the HapMap African Americans from the American Southwest population, as representative of African Americans, and the Centre d’Etude du Polymorphism Humain Utah population, as representative of European Americans. FST was calculated across these populations for 81,700 markers that had alleles called consistently between the exome sequence and the 1000 Genomes data.

Results

Phenotypic differences

Phenotypic differences among ancestry groups are shown in Fig. 1. Age- and sex-adjusted prevalence of diabetes was highest in American Indians (34.0%) and lower in African Americans (12.4%) and European Americans (10.4%, p = 2.9 × 10−10 for difference among groups). Similarly, mean age- and sex-adjusted maximum BMI was 36.3 kg/m2 in American Indians, 33.4 kg/m2 in African Americans and 33.0 kg/m2 in European Americans (p = 1.9 × 10−7). Height was also significantly different among ancestry groups, with American Indians being shorter than African Americans and European Americans (p = 1.9 × 10−18, ESM Fig. 3). Among those with normal glucose tolerance, fasting serum insulin was higher in American Indians (geometric mean = 63.8 pmol/l adjusted for age and sex) than in African Americans (48.4 pmol/l) or European Americans (45.2 pmol/l, p = 9.2 × 10−5). The 30 min insulin, adjusted for age, sex and 30 min glucose, was lower in European Americans (358.8 pmol/l) than in African Americans (553.5 pmol/l) and American Indians (559.8 pmol/l, p = 5.7 × 10−8). With additional adjustment for BMI, differences in fasting insulin were largely attenuated (p = 0.25), while with additional adjustment for BMI and fasting insulin, differences in 30 min insulin remained statistically significant (p = 2.3 × 10−5).

Fig. 1
figure 1

(a) Age- and sex-adjusted prevalence of diabetes by ancestry group. Prevalence is adjusted to the mean age and sex distribution of the total sample. Prevalence was 12.4% in African Americans (AA), 34.0% in American Indians (AI) and 10.4% in European Americans (EA) (p = 2.9 × 10−10). (b) ‘Bean’ plot of the distribution of age- and sex-adjusted BMI by ancestry group. Symmetrical lines represent a nonparametric local density function, estimated with PROC KDE in SAS. Thick black horizonal lines represent the mean value. Grey horizontal lines represent individual data points, with the length of the line indicating the number of observations at each level. Mean BMI was 33.4 kg/m2 in AA, 36.3 kg/m2 in AI and 33.0 kg/m2 in EA (p = 1.9 × 10−7). (c) ‘Bean’ plot of the distribution of age- and sex-adjusted fasting insulin level in individuals with normal glucose tolerance by ancestry group. Geometric mean fasting insulin was 48.4 pmol/l in AA, 63.8 pmol/l in AI and 45.2 pmol/l in EA (p = 9.2 × 10−5). (d) ‘Bean’ plot of the distribution of 30 min post-load insulin levels, adjusted for age, sex and 30 min glucose, in individuals with normal glucose tolerance by ancestry group. Geometric mean 30 min insulin was 553.5 pmol/l in AA, 559.8 pmol/l in AI and 358.8 pmol/l in EA (p = 5.7 × 10−8)

Heritability

In 761 American Indian participants, type 2 diabetes was highly familial; 77% of the liability was potentially due to genetic factors (h2 = 0.77; 90% CI 0.34, 1.00; p = 0.0017). Similarly, significant familial aggregation was observed for maximum BMI (h2 = 0.36; 0.19, 0.53; p = 2.5 × 10−4) and height (h2 = 0.71; 0.54, 0.85; p = 4.4 × 10−11). In 434 American Indian participants with normal glucose tolerance, significant heritability was observed for fasting insulin (h2 = 0.35; 0.04, 0.65; p = 0.035) and 30 min insulin adjusted for 30 min glucose (h2 = 0.31; 0.03, 0.66; p = 0.035). These estimates were made with adjustment for the first genetic principal component, which captures the major source of stratification in this population; additional sources of population stratification may be captured with additional principal components, but, given the small number of relative pairs, at the risk of model overspecification. To assess the robustness of the h2 estimates, we repeated the analysis with adjustment for the first five genetic principal components. For most traits, the h2 estimates were only modestly attenuated (ESM Table 2). The exception was 30 min insulin, for which h2 approached 0 (where PST is of questionable meaning as a measure of selection), so the h2 estimate for this trait is not robust.

Comparison of genotypic and phenotypic divergence

Mean FST-M among all three ancestry groups across all 97,388 markers was 0.130 (Table 1). Estimates of PST for each phenotype are shown in Table 1; none of these were significantly higher than FST. For type 2 diabetes PST = 0.149 (90% CI 0.038, 0.272; p = 0.35 for comparison with FST). For maximum BMI, PST = 0.094 (0.017, 0.184; p = 0.54), and for height PST = 0.116 (0.053, 0.189; p = 0.46). Among those with normal glucose tolerance, PST for fasting insulin was 0.095 (0.001, 0.214; p = 0.54), while for 30 min insulin PST = 0.216 (0.082, 0.358; p = 0.18). The multivariable test across all five traits was not significant (p = 0.46). The largest departures from neutral expectations for pairs of ancestry groups generally occurred between American Indians and European Americans, but were not statistically significant; for diabetes PST-MAX = 0.22 (p = 0.37), while for BMI PST-MAX = 0.14 (p = 0.61). Similar results were obtained with directly measured BMI, with fasting insulin adjusted for BMI and with 30 min insulin adjusted for BMI and fasting insulin (ESM Table 3). Similar results were also obtained when men and women were analysed separately (ESM Fig. 4, ESM Table 4). When PST was calculated according to eq. 3, PST values tended to be lower than FST (ESM Table 5). A summary of the primary analyses is shown in Fig. 2.

Table 1 Genotypic divergence (FST) and phenotypic divergence (PST) for type 2 diabetes and related traits among ancestry groups
Fig. 2
figure 2

Summary of results of primary analyses. The extent of divergence among ancestry groups (FST for genotypes, PST for phenotypes) is shown on the x-axis. The height of the bars on the y-axis represents the frequency (%) among 97,388 markers at each divergence level. The values of PST for each phenotype are indicated by arrows. Divergence levels above the critical value of 0.340 (indicated by the vertical line) are considered indicative of diversifying selection, while those below this value are more consistent with neutrality

In Bayesian outlier analyses, the number of markers potentially under selection ranged from 416 with prior odds for selection vs neutrality of 1:10, to 61,195 with prior odds of 4:3. Statistical significance levels were similar when restricted to putatively neutral markers regardless of prior odds (ESM Table 6). By analysis of individual admixture proportions, we estimated that, on average, 80% of the ancestry of African American participants derived from African sources, 96% of the ancestry of American Indian participants derived from Amerindian sources and 99% of the ancestry of European American participants derived from European sources. We repeated the divergence analyses with restriction to those whose genetic ancestry derived ≥85% from the continent corresponding to their stated ancestry group, constituting 31 African Americans, 475 American Indians and 126 European Americans. Similar results were obtained (ESM Table 7); for diabetes PST = 0.143 (p = 0.41), and, for BMI, PST = 0.098 (p = 0.57), while FST = 0.144.

Sensitivity analyses

Results of sensitivity analyses, which calculate PST for different values of h2 and c, are shown in Fig. 3. For diabetes, PST was generally less than the 95th centile of marker-wise FST (0.340), except when h2 was low and c was high (e.g., h2 = 0.20 and c > 0.80). For BMI, fasting insulin and 30 min insulin, PST did not exceed the critical value of 0.340 for any value of c, for any h2 ≥ 0.20.

Fig. 3
figure 3

(a) Sensitivity analyses showing values of PST for maximum BMI for various levels of h2 and c. (b) Sensitivity analyses showing values of PST for type 2 diabetes for various levels of h2 and c. (c) Sensitivity analyses showing values of PST for fasting insulin level for various levels of h2 and c. (d) Sensitivity analyses showing values of PST for 30 min post-load insulin level for various levels of h2 and c. Results are shown for h2 = 0.20, the estimated value of h2, its upper confidence limit and its lower confidence limit (if >0.20). Points shown as triangles are calculated under eq. 2, and are the same as those shown in Table 1, while points shown as circles are calculated under eq. 3 (ESM Table 5)

Analyses of additional population data

Results of analyses comparing African Americans and European Americans from NHANES, and SWAI, are shown in Fig. 4. Age- and sex-adjusted prevalence of diabetes was highest in SWAI (40.6%), and lower in African Americans (14.2%) and European Americans (8.0%, p = 1.1 × 10−214); similarly, mean BMI was highest in SWAI (35.1 kg/m2), and lower in African Americans (30.5 kg/m2) and European Americans (29.0 kg/m2, p = 2.1 × 10−286). FST comparing SWAI with African Americans and European Americans from the HapMap populations was 0.134; the 95th centile of marker-wise FST was 0.354. Although PST values were modestly higher among these three populations than among the Phoenix cohort, they were well within the expected distribution of FST-M; for diabetes PST = 0.195 (90% CI 0.166, 0.224), while for BMI PST = 0.252 (0.223, 0.282) (ESM Table 8). When calculated by eq. 3, PST = 0.157 (0.133, 0.182) for diabetes and PST = 0.109 (0.094, 0.124) for BMI.

Fig. 4
figure 4

(a) Age- and sex-adjusted prevalence of diabetes in African American (AA) and European American (EA) individuals from NHANES, and the SWAI cohort. Prevalence is adjusted to a mean age of 42.3 and a percentage of men of 54% for comparability with Fig. 1. For comparability with the NHANES data, the diabetes diagnosis for the SWAI cohort was based on data obtained at the last examination and included self-reported current or previous use of diabetes medication, as well as concurrent measures of glucose. Prevalence was 14.2% in AA, 40.6% in SWAI and 8.0% in EA (p = 1.1 × 10−214). (b) ‘Bean’ plot of the distribution of age- and sex-adjusted BMI in AA and EA individuals from NHANES, and the SWAI cohort. Mean BMI was 30.5 kg/m2 in AA, 35.1 kg/m2 in SWAI and 29.0 kg/m2 in EA (p = 2.1 × 10−286). (c) Genetic divergence, measured by FST among AA and EA from the HapMap samples 1000 Genomes project and the SWAI cohort. Values for each pairwise comparison between groups are shown as bars, while the overall divergence is indicated by the horizontal line. (d) Divergence for diabetes prevalence, measured by PST, among AA and EA from NHANES and the SWAI cohort. (e) Divergence in BMI, measured by PST, among AA and EA from NHANES and the SWAI cohort. Heritability estimates for PST calculations are the same as those used for the Phoenix cohort in Table 1

Discussion

Prevalence of type 2 diabetes and obesity differs across human continental ancestry groups, and there has been considerable speculation about the role of natural selection in these differences. The ‘thrifty genotype’ hypothesis posits that greater efficiency in using energy from food conferred a selective advantage in time of famine but that this predisposes to diabetes and obesity in modern environments [49, 50]. Differences in exposure and response to famine could, thus, have resulted in differences in prevalence of diabetes and obesity across ancestry groups [5,6,7]. Alternatively, it has been proposed that release from predation freed humans from selective pressure against obesity, and that high prevalence of obesity in humans is the result of neutral genetic drift [51]. Others have proposed that agriculture introduced a high load of carbohydrate into human diets, and that populations that adopted agriculture early, such as Europeans, have experienced greater selection for carbohydrate tolerance than other populations, resulting in protection from diabetes [8, 9]. Others have hypothesised that differences in diabetes and obesity across ancestry groups reflect adaptation to different climates or to different infectious diseases [10,11,12,13]. Such hypotheses have sometimes been discussed on the basis of the differences among ancestry groups, without consideration of whether these differences could arise neutrally.

There are few empirical genetic data supporting hypotheses that natural selection across ancestry groups contributes to risk of diabetes or obesity. Some studies have analysed established type 2 diabetes and obesity variants for molecular signatures of recent natural selection, such as extended haplotypic homozygosity. These studies have generally not found greater evidence for selection at established variants in comparison with suitably matched genomic variants [16,17,18], although one study did find a modest excess of evidence for selection at protective alleles for diabetes [22]. These methods are most powerful for detecting classic ‘sweeps’ where a previously rare allele at a single locus rapidly increases in frequency, and they are not well suited for detecting selection on polygenic traits. Others have used established variants to estimate a polygenic QST analogue, and they found little evidence for differential selection for BMI or type 2 diabetes variants [52]. Other studies have compared allele frequencies for established variants across ancestry groups. Although these have found considerable differences in allele frequencies, the pattern of differences has not corresponded to epidemiologic risk [2, 19,20,21]; type 2 diabetes risk allele frequencies for established variants tend to be high in Africans but low in American Indians. However, the causal variants that contribute to diabetes and obesity are incompletely known and linkage disequilibrium patterns vary across populations, and this can introduce unpredictable biases into these comparisons [53]. The variance components methods used in the present study are designed to detect polygenic selection which differs across ancestry groups, and they do not require knowledge of specific causal variants. Our analyses show that phenotypic differences for type 2 diabetes and related traits among African Americans, American Indians and European Americans are consistent with expectations based on heritability and genetic divergence. Thus, strong diversifying selection is not necessary to explain the phenotypic differences.

Recent genetic admixture among groups could attenuate the phenotypic differences. However, we obtained similar results when analyses were restricted to those whose genetic ancestry derived ≥85% from the continent corresponding to their stated ancestry, and this suggests that our results are not unduly influenced by admixture. Phenotypic measures were made identically across the ancestry groups in our cohort. Although the Phoenix cohort is a relatively small convenience sample, we observed significant differences in diabetes and obesity risk among groups, which replicate known epidemiologic associations [1, 2]. Furthermore, PST estimates were only modestly higher when calculated among NHANES samples representative of the US African American and European American populations and an SWAI population with high prevalence of obesity and diabetes. Thus, results from these larger population samples are generally consistent with those from the Phoenix cohort.

Comparisons of phenotypic with genotypic divergence are optimally made using genetic components of variance for the trait (i.e., QST), while we have used the total phenotypic components of variance (PST). The extent to which PST approximates QST depends on heritability (h2) and the proportion of the among-population phenotypic variance explained by genetic factors (c). Our estimates of h2 are based on a relatively small sample of related individuals, but they are comparable to those reported in large meta-analyses of twin studies for BMI and diabetes [3, 4]. Although our primary analyses were conducted under the assumption that c = 1, environmental differences in determinants of these traits among populations would result in lower values of c, and would tend toward lower values of PST. Our sensitivity analyses suggest that our findings are consistent over a large range of values of h2 and c. For BMI and diabetes, the largest pairwise differences in phenotypic and genotypic divergence were observed between American Indians and European Americans, and these differences suggest that a relatively modest 14% or 22% of the genetic variance in each trait, respectively, is potentially attributable to differential selection (this is under the assumption that there is no effect of neutral genetic drift, although the genotypic divergence suggests these values are well within neutral expectations).

With comparisons over only three ancestry groups representing a large portion of modern human genetic history, the present approach has limited power to detect modest degrees of diversifying selection—fairly strong or sustained selection is required for the phenotypic divergence of a single trait to exceed expectations based on genotypic divergence. We estimate by simulation that PST ≈ 0.40 is required to detect p < 0.05 with 80% power (ESM Fig. 5; this corresponds to an average difference in phenotypic mean of ~0.9 SD between pairs of ancestry groups). This degree of divergence is somewhat less than observed for traits under established diversifying selection, such as skin pigmentation or craniofacial morphometry ([38, 43]; we did not directly measure skin pigmentation in the present study, but, for skin pigmentation predicted genetically [54], PST = [QST] = 0.831, p = 1.0 × 10−5, ESM Fig. 6). The power of these variance component methods to detect diversifying selection depends on the number of ancestry groups included, as well as the number of individuals in each ancestry group [45]. While we obtained similar PST estimates among the larger population cohorts as in the Phoenix cohort with the same three ancestry groups, inclusion of additional ancestry groups may be required to detect more subtle selection. The present results do not exclude more complex models of selection, such as selection on a suite of complex traits, including some diabetes-related traits, with the overall differentiation constrained by pleiotropy, nor do they exclude modest diversifying selection on diabetes-related traits too weak to be detected by the present methods primarily affecting the American Indian group. However, our results were obtained in major continental ancestry groups at diverse risk for diabetes and obesity, and they suggest that differences in natural selection across these groups are not necessary to explain the phenotypic differences. Investigations of the causes of differences in diabetes risk across these groups would do well to consider alternative explanations.