Abstract
Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use1. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels2, heart disease remains the leading cause of death worldwide3. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns24. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine25, we anticipate that increased diversity of participants will lead to more accurate and equitable26 application of polygenic scores in clinical practice.
Similar content being viewed by others
Main
The Global Lipids Genetics Consortium aggregated GWAS results from 1,654,960 individuals from 201 primary studies representing the following five genetic ancestry groups: admixed African or African (N = 99,432, 6.0% of the sample); East Asian (N = 146,492, 8.9%); European (N = 1,320,016, 79.8%); Hispanic (N = 48,057, 2.9%); and South Asian (N = 40,963, 2.5%) (Table 1, Supplementary Table 1, Supplementary Fig. 1). We performed GWAS for the following five blood lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), triglycerides (TGs), total cholesterol (TC) and non-high-density lipoprotein cholesterol (nonHDL-C). Of the 91 million variants imputed from the Haplotype Reference Consortium or 1000 Genomes Phase 3 that successfully passed variant-level quality control, 52 million variants were present in at least 2 cohorts and had sufficient minor allele counts (>30 in the meta-analysis) to be evaluated as a potential index variant.
Ancestry-specific genetic discovery
We first quantified the number of genome-wide significant loci identified in at least one of the five ancestry-specific meta-analyses. We found 773 lipid-associated genomic regions that contained 1,765 distinct index variants that reached genome-wide significance (P < 5 × 10−8, ±500 kb) (Supplementary Tables 2 and 3, Supplementary Figs. 2 and 3) for at least 1 ancestry group and lipid trait. Of these regions, 237 were deemed new because the most-significant index variant in each region was >500 kb from variants that have been previously reported as associated with any of the five lipid traits4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,27. Of these loci, 76% were identified only in the European ancestry-specific analyses (N = ~1.3 million, 80% of the sample). Of the non-European ancestries, the African ancestry GWAS (N = ~99,000, primarily African American) identified more ancestry-specific loci (15 unique to admixed African or African) than any other non-European ancestry group (6 loci unique to East Asian, 6 to Hispanic, 1 to South Asian). This difference is probably because allele frequencies between African and European ancestry populations show the largest variation (Fig. 1a–d) and because African populations have greater genetic diversity than other populations28.
Multi-ancestry genetic discovery
We next performed multi-ancestry meta-analyses using the meta-regression approach implemented in MR-MEGA29,30 to account for heterogeneity in variant effect sizes on lipids between ancestry groups. A total of 1,750 index variants at 923 loci (±500 kb regions) reached genome-wide significance for at least 1 lipid trait. These included 168 regions not identified by ancestry-specific analysis, 120 (71%) of which are new (Supplementary Tables 4 and 5, Supplementary Fig. 4, Extended Data Fig. 1). Almost all (98%) the index variants from the ancestry-specific analysis remained significant (P < 5 × 10−8) after meta-analysis across all ancestry groups. However, 15 admixed African or African, 9 East Asian, 3 Hispanic and 1 South Asian index variants from the ancestry-specific analysis did not remain significant (multi-ancestry P values of 7.7 × 10−6 to 5.9 × 10−8) (Supplementary Fig. 5, Supplementary Note). In total, we identified 941 lipid-associated loci including 355 new loci from either single- or multi-ancestry analyses.
Next, we compared the number of loci identified per 100,000 participants in each ancestry group and the combined dataset (Fig. 1e). Admixed African and Hispanic ancestry-specific analyses identified the most loci per genotyped individual, which is perhaps due to African ancestry and/or increased genetic diversity. European and multi-ancestry analyses identified slightly fewer loci per 100,000 individuals, which probably reflects a slight reduction in benefit from the addition of new samples to extremely large sample sizes (>1 million). For the genome-wide significant variants discovered in each ancestry, we estimated the proportion of ancestry-enriched variants by enumerating the number of other ancestries with sufficient power to detect an association (range of 0–4). We estimated the power for discovery of each variant by assuming an equivalent discovery sample size in the other ancestries, fixed effect size and observed allele frequencies from the other ancestries (Fig. 1f). To enable comparisons at similar sample sizes across ancestry groups, we selected European ancestry index variants identified from a meta-analysis of approximately 100,000 individuals subsampled from the current study. African ancestry index variants were the most ancestry-enriched, with only 61% of index variants demonstrating sufficient power in at least 1 other ancestry group (equal N, power of >80% to reach α = 5 × 10−8). This result is probably due to population-enriched allele frequencies. By comparison, 88% of South Asian index variants had an estimated power of >80% in at least 1 other ancestry.
Finally, we found that both the number of identified variants and the mean observed chi-squared values from genome-wide lipid-association tests were approximately linearly related to the meta-analysis sample size across ancestries (Supplementary Table 6, Extended Data Fig. 2). However, in the European ancestry group, the incremental increase in either the number of loci or the chi-squared value was slightly attenuated at the largest sample sizes. Taken together, these results suggest that once sufficiently well-powered GWAS sample sizes are reached within a given ancestry group, the assembly of large sample sizes of other under-represented groups will only modestly enhance variant discovery relative to increasing the sample size of the predominant ancestry.
Comparison of effects across ancestries
Differences in association signals across ancestries despite similar sample sizes could be due to variations in allele frequencies and/or effect sizes. This could reflect different patterns of linkage disequilibrium (LD) with the underlying causal variant or an interaction with an environmental risk factor for which prevalence varies by ancestry and/or geography. We found that effect size estimates of individual variants were similar based on pairwise comparison between ancestries (R2 = 0.93 for variants with P < 5 × 10−8) (Extended Data Fig. 3, Supplementary Table 7, Supplementary Fig. 6). We also tested for genome-level differences in effect-size correlations for East Asian, European and South Asian ancestry groups using Popcorn31, and the results were not significantly different from 1 (P > 0.05; Supplementary Figs. 7 and 8). We tested for differences in genetic correlations between admixed African and European ancestries in the UK Biobank and the Million Veteran Program (MVP) using the bivariate genome-based restricted maximum likelihood (GREML) method30,32, as the Popcorn method does not account for long-range LD in admixed populations. The genetic correlation between admixed African and European ancestries for HDL-C (r = 0.84) was not significantly different from 1 in the UK Biobank dataset (which may be due to the small numbers of African ancestry individuals in this database). By contrast, correlations for the other traits ranged from 0.52 to 0.60 in UK Biobank and from 0.47 to 0.69 in the MVP (Supplementary Table 8). These results indicate that there is a moderately high correlation in lipid effect sizes across ancestry groups when considering all genome-wide variants.
Of the 2,286 index variants that reached genome-wide significance in the multi-ancestry meta-analysis for any of the five lipid traits, 159 (7%) showed significant heterogeneity of effect size due to ancestry (P < 2.2 × 10−5; Bonferroni-corrected for 2,286 variants) (Supplementary Table 5). Of these 159 variants, 31 showed the largest effect in African ancestry analyses, 24 in East Asian, 67 in European, 20 in Hispanic and 17 in South Asian. Only 49 (2%) of these variants from the multi-ancestry meta-analysis showed significant residual heterogeneity that was not due to ancestry, which may be attributable to differences in ascertainment or analysis strategy between cohorts (Supplementary Table 5). This result suggests that cohort-related factors are a less important driver of heterogeneity than genetic ancestry.
Multi-ancestry analyses aid fine-mapping
We next assessed whether multi-ancestry fine-mapping narrowed the set of probable causal variants at each of the independent multi-ancestry association signals (LD R2 < 0.7), assuming one shared causal variant per ±500 kb region (Supplementary Table 9). A total of 19% of the association signals had only one variant in the 99% credible set and 55% (816 out of 1,486) had ≤10. By contrast, 5% (73 out of 1486) had >100. Of the 407 variants with >90% posterior probability of being the causal variant at a locus in the multi-ancestry meta-analysis, 56 (14%) were missense variants, 7 (2%) were splice-region variants and 4 (1%) were stop-gain variants (CD36, HBB, ANGPTL8 and PDE3B) (Supplementary Tables 10–12).
The median number of variants in 99% credible sets from the European ancestry analysis was 13, but this was reduced to 8 in the multi-ancestry analysis. Of 1,486 independent association signals, 825 (56%) had reduced credible set size in the multi-ancestry analysis. At these 825 loci, the number of variants in the multi-ancestry credible sets was reduced by 40% relative to the minimum credible set size in either admixed African (the most genetically diverse group) or European ancestry analyses (Extended Data Fig. 4). We estimated that increasing the sample size of European ancestry samples to that of the multi-ancestry analysis would yield a 20% reduction in the credible set size, which is approximately half of the 40% reduction observed in the multi-ancestry analysis. This suggests that sample size differences alone do not explain the reduction. Instead, differences in LD patterns and effect sizes across ancestries probably contribute to the improved fine-mapping (Supplementary Note). For example, rs900776, an intronic variant in the DMTN region with many high LD variants and a posterior probability of 0.51 of being causal in the European ancestry group, increases to a posterior probability of 0.86 in the African-ancestry-derived credible sets, and >0.99 in the multi-ancestry analysis (Fig. 2).
Multi-ancestry polygenic risk scores are most predictive
We evaluated the potential of polygenic risk scores (PRS; sometimes also called polygenic scores (PGS)) to predict increased LDL-C levels, which is a major causal risk factor of coronary artery disease, in diverse ancestry groups. We created three non-overlapping datasets for the following discrete steps: (1) perform ancestry-specific or multi-ancestry GWAS to estimate variant effect sizes; (2) optimize risk score parameters; and (3) evaluate the utility of the resulting scores. For each ancestry-specific or multi-ancestry GWAS, we created multiple PRS weights, either genome-wide with PRS-CS33 or using pruning and thresholding to select independent variants. We tested each score in the optimizing dataset, which was matched for ancestry to the GWAS (admixed African or African, East Asian, European, South Asian, and all ancestries from the UK Biobank; and Hispanic from the Michigan Genomics Initiative (MGI); Extended Data Figs. 5 and 6, Supplementary Tables 13–15). The top-performing score from each GWAS was selected: PRS-CS for East Asian ancestry, European ancestry and European ancestry scores from a previous GLGC GWAS from 20104; and an optimized pruning and threshold-based score for all others. We then evaluated the optimal PRS in 8 cohorts of individuals (N = 295,577, Supplementary Table 16) not included in the discovery GWAS from 7 ancestral groupings: East Asian (146,477), European American (85,571), African American (21,730), African (2,452 East Africa, 4,972 South Africa and 7,309 West Africa), South Asian (15,242), Hispanic American (7,669), and Asian American (4,155).
The PRS developed from the multi-ancestry meta-analysis consistently showed the best or near-best performance in each group tested, with improved or comparable predictions relative to ancestry-matched scores (adjusted R2 = 0.10–0.16; Fig. 3, Supplementary Table 17, Extended Data Fig. 7). This observation was particularly evident for ancestries with smaller GWAS sample sizes, as was the case for Hispanic and South Asian. For African Americans in the MGI and the MVP datasets, polygenic prediction scores were similar for individuals with different levels of African ancestry admixture (Extended Data Fig. 8) and reached the level of prediction observed for European ancestry individuals from the same dataset. The increase in LDL-C per each standard deviation increase in the PRS was also similar between ancestry groups in the MVP (effect size ± standard error) : 13.2 ± 0.22 mg dl–1 for African American, 8.9 ± 0.47 mg dl–1 for Asian (East Asian/South Asian), 10.5 ± 0.10 mg dl–1 for European and 10.6 ± 0.32 mg dl–1 for Hispanic. We repeated the evaluation of multi-ancestry versus single-ancestry PRS by generating GWAS with a sample size of approximately 100,000 individuals and with fixed methodology, and the results were consistent with those from the full dataset (Fig. 3b, Supplementary Fig. 9). Thus, polygenic prediction for LDL-C in all ancestries appears to benefit the most from adding samples of diverse ancestries, given a scenario where large numbers of European ancestry individuals have already been included. Additional studies are needed to determine whether this applies to other phenotypes with different genetic architectures and heritabilities.
Discussion
Genome-wide discovery for blood-lipid traits based on more than 1.65 million individuals from 5 ancestry groups confirmed that the contributions of common genetic variations to blood lipids are similar across diverse populations. First, we found that the number of significant loci relative to sample size was similar within each ancestry group and approximately linearly related to sample size, with a small increase in ancestry-specific variants observed in African ancestry cohorts relative to the others. Second, we demonstrated that inclusion of additional ancestries through multi-ancestry fine-mapping reduces the set of candidate causal variants in credible sets and does so more rapidly than in single-ancestry analysis. Multi-ancestry GWAS should therefore facilitate the identification of effector genes at GWAS loci and enable accelerated biological insight and identification of potential drug targets. Third, we found that a PRS derived from approximately 88,000 African ancestry and about 830,000 European ancestry individuals was correlated with observed lipid levels among individuals with admixed African ancestry equally well as among individuals with European ancestry. We hypothesize that the inclusion of African ancestry individuals in the GWAS yielded an improvement in polygenic prediction performance through the general fine-mapping of loci and the improved prioritization of multi-ancestry causal variants. Fourth, and perhaps most important, the multi-ancestry score was generally the most informative score across all the major population groups examined. This provides useful information for other genetic discovery efforts and investigations of the utility of PRS in diverse populations.
The generalizability of these findings—regarding the portability of PRS from the multi-ancestry meta-analysis—to other traits may depend on the heritability, the degree of polygenicity, the level of genetic correlation, the allele frequencies of causal variants across ancestry groups, gene–environment interactions, and the representation of diverse populations in the GWAS34,35. Although many traits show a high degree of shared genetic correlation across ancestries32,36,37, others have distinct genetic variants with large effects that are more common in specific ancestry groups34, which may limit the utility of multi-ancestry PRS for particular phenotypes in some ancestries.
The benefits for genetic discovery efforts as GWAS sample sizes increase will probably not be measured just by the number of loci discovered. Rather, the focus will increasingly turn to improving our understanding of the biology at established loci, identifying potential therapeutic targets and efficiently identifying individuals at high-risk of adverse health outcomes across population groups without exacerbating existing health disparities. Considering the results presented here, and those of related studies38,39,40, we consider that future genetic studies will substantially benefit from meta-analyses across participants of diverse ancestries. Further gains in the depth and number of sequenced individuals of diverse ancestries41,42 may also improve the discovery of new variants and loci in diverse cohorts, in particular variants that are absent at present from arrays and imputation reference panels. Our results suggest that diversifying the populations under study, rather than simply increasing the sample size, is now the single most efficient approach to achieving these goals, at least for blood lipids and probably for related downstream adverse health outcomes such as cardiovascular disease. However, if costs for recruitment of diverse populations are higher than recruitment of individuals from previously studied ancestry groups, and the total number of genome-wide significant index variants is the goal, then continued low-cost recruitment of any ancestry group is expected to still provide genetic insight. Taken together, our results strongly support ongoing and future large-scale recruitment efforts targeted at the enrolment and DNA collection of non-European ancestry participants. Geneticists and those responsible for cohort development should continue to diversify genetic discovery datasets, while increasing sample size in a cost-effective manner, to ensure that genetic studies reduce rather than exacerbate existing health inequities across race, ancestry, geographical region and nationality.
Methods
Cohort-level analysis
Each cohort contributed GWAS summary statistics for HDL-C, LDL-C, nonHDL-C, TC and TGs, imputation quality statistics, and analysis metrics for quality control (QC) following a detailed analysis plan. The GWAS protocol is deposited in Protocol Exchange (doi: 10.21203/rs.3.pex-1687/v1). In brief, we requested that each cohort perform imputation to 1000 Genomes Phase 3 v5 (1KGP3), with European ancestry cohorts additionally imputing with the Haplotype Reference Consortium (HRC) panel using the Michigan Imputation Server (https://imputationserver.sph.umich.edu/index.html#!), which uses Minimac software43. Detailed pre-imputation QC guidelines were provided, and these included removing samples with call rate <95%, samples with heterozygosity > median + 3 (interquartile range), ancestry outliers from principal component (PC) analysis within each ancestry group and variants deviating from Hardy–Weinberg equilibrium (HWE; P <1 × 10−6) or with variant call rate <98%. Analyses were carried out separately by ancestry group and were also stratified by cases and controls where appropriate (that is, for a disease-focused cohort such as coronary artery disease). Residuals were generated separately in males and females adjusting for age, age2, PCs of ancestry and any necessary study-specific covariates. TG levels were natural log-transformed before generating residuals. Inverse normalization was then done on the residual values. Individuals on cholesterol-lowering medication had their pre-medication levels44 approximated by dividing the LDL-C value by 0.7 and the TC value by 0.8. Association analysis of the residuals for the majority of cohorts was carried out using a linear mixed-model approach in rvtests or with other similar software, including BOLT-LMM45, SAIGE46 or deCode association software.
QC analysis
Each input file was assessed for QC using the EasyQC software47 (www.genepi-regensburg.de/easyqc). We generated quantile–quantile plots using minor allele frequency (MAF) bins, assessed trends in standard errors relative to the sample size for each cohort and checked MAF values of submitted variants relative to their expected value based on the imputation reference panel. In addition, we checked that each cohort reproduced the expected direction of effect at most known loci relative to the cohort sample size. Cohorts identified to have issues with the submitted files were contacted, and corrected files were submitted or the cohort was excluded from the meta-analysis. Results from either sex-stratified analysis or sex-combined analysis with sex as a covariate were used. During the QC process, within each cohort we removed poorly imputed variants (info score or R2 < 0.3), variants deviating from the HWE (P <1 × 10−8, except for the MVP, which used HWE P <1 × 10−20) and variants with minor allele count <3. An imputation info score threshold of 0.3 was selected to balance the inclusion of variants across diverse studies while removing poorly imputed variants. Summary statistics were then genomic control (GC) corrected using the λGC value calculated from the median P value of variants with MAF > 0.5%. To capture as many variants as possible, summary statistics from cohorts that had submitted both HRC and 1KGP3 imputed files were combined, selecting variants imputed from HRC for which both imputed versions of a variant existed. For variants imputed by both panels, we observed that variants imputed from the HRC panel resulted in a higher imputation info score for 94% of variants compared with the imputation info score from 1KGP3.
Meta-analysis
Ancestry-specific meta-analysis was performed using Raremetal48 (https://github.com/SailajaVeda/raremetal). The multi-ancestry meta-analysis (also referred to as trans-ancestry meta-analysis) was performed using MR-MEGA48 with five PCs of ancestry. The choice of five PCs was made after comparing the λGC values across MAF bins from meta-analysis of HDL-C with MR-MEGA using from two up to ten PCs. In addition, fixed-effects meta-analysis was carried out with METAL49 to calculate effect sizes for use in the creation of PRS. Study-level PCs were plotted for each cohort by ancestry group to verify that the reported ancestry for each cohort was as expected. Following the meta-analysis, we identified loci based on a genome-wide significance threshold of 5 × 10−8 after GC correction using the λGC value calculated from the median P value of variants with MAF > 0.5%. The choice of double-GC correction was made to be most conservative and to minimize potential false-positive findings. Observed λGC values were within the expected range for similarly sized studies and are included in Supplementary Tables 2 and 4. Variants with a cumulative minor allele count of ≤30 and those found in a single study were excluded from index variant selection. Index variants were identified following an iterative procedure starting with the most significant variant and grouping the surrounding region into a locus based on the larger of either ±500 kb or ±0.25 cM. cM positions were interpolated using the genetic map distributed with Eagle v.2.3.2 (genetic_map_hg19_withX.txt)50. Variants were annotated using WGSA51, including the summary of each variant from SnpEff52 and the closest genes for intergenic variants from ANNOVAR53. Annotation of variants as known or new was done based on manual reviews of previously published variants and with variants reported in the GWAS catalogue27 for any of the studied lipid traits (accessed May 2020, provided as Supplementary Table 18). For comparison between ancestries and lipid traits, index variants were grouped into genomic regions starting with the most significantly associated variant and grouping all surrounding index variants within ±500 kb into a single region.
Power to detect association within each ancestry was determined using the effect size and sample size of the variant within the original discovery ancestry group and the observed allele frequency from the other ancestry groups with α set to 5 × 10−8. We excluded variants that were only successfully imputed in a single ancestry group to account for imputation panel differences between groups (for example, HRC for European ancestry individuals and 1KGP3 for other ancestries). Variants that were successfully imputed in two or more ancestries were assumed to have zero power in any other ancestry for which the variant was not successfully imputed. The proportion of variance explained by each variant was estimated as 2β2(1 – f)f, where β is the effect size from METAL and f is the effect allele frequency (Supplementary Table 19). The proportion of variance explained within each ancestry was estimated using the multi-ancestry effect size from METAL with the ancestry-specific allele frequency. Coverage of the genome by associated genetic regions was calculated using BEDTools54 for the regions defined by the minimum and maximum position within each locus with P < 5 × 10−8.
Conditional analysis
Approximate conditional analysis was performed using rareGWAMA55 to identify index variants that were shadows of nearby, more significant associations. LD reference populations were taken from UK Biobank specific to admixed African, European (subset of 40,000) or South Asian ancestry individuals or from 1KGP3 for East Asian or Hispanic ancestry individuals. Conditional analysis was carried out using the individual cohort-level summary statistics as was done for the meta-analysis with Raremetal. rareGWAMA requires imputation quality scores, which were set to 1 for all variants, that had previously passed QC (pre-filtered at imputation info/R2 > 0.3). The European ancestry subset of UK Biobank was used as the reference population for the conditional analysis of the multi-ancestry meta-analysis (approximately 80% European ancestry). Stepwise conditional analysis was performed sequentially for the index variants within each chromosome ranked by most to least significant. Index variants were then flagged as not independent from other more significant variants if the absolute value of the ratio of the original effect size to the effect size after conditional analysis was greater than the 95th percentile of all values (Supplementary Fig. 10). This threshold was selected to remove variants for which the effects were driven by nearby, more strongly associated variants in LD. This corresponded to a ratio of original to conditional effect size of 1.6 for the ancestry-specific conditional analysis and a ratio of 1.7 for the multi-ancestry conditional analysis. The effect sizes from the meta-analysis with METAL were used for comparison with the multi-ancestry conditional analysis results. Variants flagged as non-independent were excluded from the summary results in the manuscript and are flagged as non-independent in Supplementary Tables 3 and 5.
Genetic correlation
Popcorn31 was used to assess the degree of correlation in effect sizes between ancestry groups for each of the lipid traits with 1KGP3 as the reference LD panel. Only variants with MAF > 0.01 in each ancestry individually were included in the comparison. Both the genetic effect and the genetic impact models were tested. Bivariate GREML from GCTA was used to calculate the genetic correlation between unrelated admixed African and a subset of white British individuals in the UK Biobank following the method of Guo et al.30,32. HapMap3 variants with MAF > 0.01 in each ancestry were used to construct the genetic relationship matrix with the allele frequencies standardized in each population. Individuals with genetic relatedness of >0.05 were removed. A total of up to 5,575 admixed African or African and 38,668 white British individuals from UK Biobank were included in the analysis of each trait after removal of related individuals. The measured lipid traits were corrected for medication use and were inverse-normalized after correction for age, sex and batch. PCs 1–20 constructed from the genetic relationship matrix were included as covariates in the calculation of genetic correlation. Analysis within the MVP included 24,502 European ancestry and 21,950 unrelated African American individuals. Maximum measured values were used for LDL-C, TC and TGs, and minimum values were used for HDL-C. Lipid traits were inverse-normalized after correction for age and sex with PCs 1–20 included as covariates in the calculation of genetic correlation.
Credible sets
Credible sets of potentially causal variants were generated for each of the loci identified in the multi-ancestry meta-analysis. We determined 99% credible sets of variants that encompassed the causal variant with 99% posterior probability. Regions for construction of the credible sets were defined as the ±500 kb region around each index variant. Bayes factors56,57 (BFs) for each variant in the ancestry-specific meta-analysis were approximated as follows:
where β and s.e. are the effect size and standard error of the effect size estimate from the Raremetal meta-analysis, and NAS is the ancestry-specific sample size. A full derivation is included in the Supplementary Methods. To account for the difference in sample sizes between ancestry groups, we also approximated the BFs after adjustment for the total multi-ancestry sample size for each trait (NTE) relative to the ancestry-specific sample size for that trait using the following equation:
Credible sets for the multi-ancestry meta-analysis were generated using the BFs as output by MR-MEGA. The credible sets within each region were generated by ranking all variants by BF and calculating the number of variants required to reach a cumulative probability of 99%. In addition, we calculated credible sets in the same manner using the European ancestry and multi-ancestry meta-analysis results, but including only the set of variants present in the admixed African or African meta-analysis. To summarize the size of the credible sets across the five lipid traits examined, we identified the set of independent index variants from the multi-ancestry meta-analysis after grouping variants based on LD. For each ± 500 kb region centred around the most significantly associated index variant for any trait, we determined the pairwise LD between all index variants in this region using LDpair58 with all reference populations (1000 Genomes African, admixed American, East Asian, European and South Asian) included. We considered variants to be independent if they were outside this region, had LD R2 < 0.7 or were not available in the LDpair reference populations. Variants within the credible sets were annotated with SnpEff52 using WGSA51 and with VEP59. The number of variants in LD with an index variant was determined using LDproxy58 (Supplementary Table 20). Protein numbering was taken from dbSNP60. Expression quantitative trait loci colocalization was performed using coloc61 (v.3.2.1) with R (v.3.4.3) using the default parameters. Results from GTEx V8 (ref. 62) were compared with the GWAS signals in the region defined by the larger of ±0.25 cM or ±500 kb surrounding each index variant. The expression quantitative trait loci and GWAS signals (based on P values from MR-MEGA) were considered to be colocalized if PP3 + PP4 ≥ 0.8 and if PP4/(PP3 + PP4) > 0.9, where PP3 is the probability of two independent causal variants while PP4 is the probability of a single, shared causal variant.
LDL-C PRS
Weights for the LDL-C PRS were derived from β estimates generated from each of the ancestry-specific meta-analyses and from the multi-ancestry results using METAL. Additional meta-analyses were carried out using the 2010 Global Lipids Genetics Consortium LDL-C meta-analysis results4 in combination with the (1) admixed African or (2) admixed African, East Asian, Hispanic and South Asian ancestry results from the current meta-analysis for comparison. Furthermore, we performed a meta-analysis of European ancestry cohorts randomly selected to reach a total sample size near 100,000, 200,000 or 400,000 to understand the role of increasing the European ancestry sample size and the influence of imputation panel. In addition, we tested possible methods for improving the performance of European -ncestry-derived scores in African ancestry individuals by separately fitting the European ancestry PRS in the UK Biobank admixed African ancestry subset to determine the best set of risk score parameters (various pruning and thresholding parameters or PRS-CS, Supplementary Note).
We generated PRS weights using both significant variants only (at a variety of P value thresholds) and using genome-wide methods. Meta-analysis results were first filtered to variants present in UK Biobank, the MGI and the MVP with imputation info score of >0.3. Pruning and thresholding was performed in PLINK63 with ancestry-matched subsets of UK Biobank individuals (admixed African N = 7,324, European N = 40,000, South Asian N = 7,193, multi-ancestry: N = 10,000 (80% European, 15% admixed African, 5% South Asian)) or 1KGP3 (Hispanic N = 347, East Asian N = 504) used for LD reference. We also tested 1KGP3 with all populations included as the LD reference panel for the multi-ancestry score (results not shown), which gave similar results to those of the UK Biobank multi-ancestry reference set originally selected for its larger sample size. P value thresholds (after GC correction) of 5 × 10−10, 5 × 10−9, 5 × 10−8, 5 × 10−7, 5 × 10−6, 5 × 10−5, 5 × 10−4, 5 × 10−3 and 5 × 10−2 were tested with distance thresholds of 250 and 500 kb and LD R2 thresholds of 0.1 and 0.2. PRS weights were also generated using PRS-CS33 with the LD reference panels for African, East Asian and European ancestry populations from 1000 Genomes provided by the developers. PRS-CS LD reference panels for the other ancestries were generated using 1000 Genomes following the same protocol as provided by the PRS-CS authors33. This included removing variants with MAF ≤ 0.01, ambiguous A/T or G/C variants and restricting to variants included in HapMap3. Pairwise LD matrices within pre-defined LD blocks64 (using European LDetect blocks for Hispanic and multi-ancestry LD calculations and Asian blocks for South Asian) were then calculated using PLINK and converted to HDF5 format.
For each individual in the testing cohorts, PRS were calculated as the sum of the dosages multiplied by the given weight at each variant. UK Biobank individuals not present in datasets used to generate the summary statistics (either admixed African, white British, both admixed African and white British, East Asian, South Asian, or all individuals excluding South Asian) were used to select the best-performing admixed African, European, admixed African+European, East Asian, South Asian, and multi-ancestry PRS, respectively. UK Biobank South Asian ancestry individuals were included in the multi-ancestry risk score weights but excluded from the UK Biobank multi-ancestry testing set due to an initial focus on comparing predictions among European and African ancestry individuals. The following sample sizes of the ancestry groups in UK Biobank used to test PRS performance were included: admixed African N = 6,863; East Asian N = 1,441; European N = 389,158; South Asian N = 6,814; ALL = 461,918. The best-performing Hispanic ancestry PRS weights were selected based on their performance in Hispanic ancestry individuals in the MGI dataset. Model fit was assessed using the adjusted R2 of a linear model for LDL-C value at initial assessment adjusted for cholesterol medication (divided by 0.7 to estimate pre-medication levels) with sex, batch, age at initial assessment and PCs 1–4 as covariates (Supplementary Tables 21–23). Python and R were used for the analysis of PRS models.
The best-performing PRS in each ancestry group was then tested in the following validation cohorts: the MGI (European N = 17,190; African American N = 1,341); East London Genes and Health65 (ELGH; South Asian N = 15,242); Tohoku Medical Megabank Community Cohort Study (ToMMo; East Asian N = 28,217); Korean Genome and Epidemiology Study66 (KoGES; East Asian N = 118,260); Penn Medicine BioBank (PMBB; African American N = 2,138); Africa America Diabetes Mellitus (AADM; 3,566 West African; 707 East African); Africa Wits-INDEPTH partnership for Genomic Studies (AWI-Gen; 1,744 East African; 4,972 South African; 3,744 West African); and MVP participants not included in the discovery meta-analysis (European N = 68,381; African American N = 18,251; East Asian/South Asian N = 4,155; Hispanic N = 7,669). Adjusted R2 values were reported for each cohort and ancestry group, with 95% confidence intervals for the adjusted R2 values calculated using bootstrapping. Within each cohort, the following covariates were used: MGI: sex, batch, PCs 1–4 and birth year; PMBB: birth year, sex and PCs 1–4; ELGH: age, sex and PCs 1–10; MVP: sex, PCs 1–4, birth year and mean age; ToMMo: sex, age, recruitment method and PCs 1–20 (only participants from Miyagi Prefecture were included); KoGES: age, sex and recruitment area; AADM: age, sex, PCs 1–3; AWI-Gen: age, sex and PCs 1–6 for East African and South African, and age, sex and PCs 1–4 for West African. The type of LDL-C value used in the model varied depending on the measurements selected by each cohort. Mean LDL-C values were used for MGI, MVP and PMBB, maximum LDL-C values for ELGH, and baseline measurements for AADM, AWI-Gen, ToMMo and KoGES. A descriptive summary of each validation cohort is included in Supplementary Table 16. African admixture for MGI was calculated using all African ancestry individuals in 1000 Genomes with ADMIXTURE (v.1.3)67. African admixture for MVP was calculated using the Yoruba in Ibadan, Nigeria (YRI) and Luhya in Webuye, Kenya (LWK) African ancestry individuals in 1KGP3.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Data availability
The GWAS meta-analysis results (including both ancestry-specific and multi-ancestry analyses) and risk score weights are available at http://csg.sph.umich.edu/willer/public/glgc-lipids2021. The optimized multi-ancestry and single-ancestry PRS weights are deposited in the PGS Catalogue (https://www.pgscatalog.org/) accession numbers PGS000886–PGS000897 (all intervening numbers).
Code availability
The code EasyQC is available at www.genepi-regensburg.de/easyqc, and Raremetal is available at https://github.com/SailajaVeda/raremetal.
Change history
26 May 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41586-023-06194-2
References
Taddei, C. et al. Repositioning of the global epicentre of non-optimal cholesterol. Nature 582, 73–77 (2020).
Ference, B. A. et al. Low-density lipoproteins cause atherosclerotic cardiovascular disease. 1. Evidence from genetic, epidemiologic, and clinical studies. A consensus statement from the European Atherosclerosis Society Consensus Panel. Eur. Heart J. 38, 2459–2472 (2017).
Roth, G. A. et al. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 392, 1736–1788 (2018).
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Liu, D. J. et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat. Genet. 49, 1758–1766 (2017).
Lu, X. et al. Exome chip meta-analysis identifies novel loci and East Asian-specific coding variants that contribute to lipid levels and coronary artery disease. Nat. Genet. 49, 1722–1730 (2017).
Kathiresan, S. et al. A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study. BMC Med. Genet. 8, S17 (2007).
Kathiresan, S. et al. Polymorphisms associated with cholesterol and risk of cardiovascular events. N. Engl. J. Med. 358, 1240–1249 (2008).
Peloso, G. M. et al. Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. Am. J. Hum. Genet. 94, 223–232 (2014).
Hoffmann, T. J. et al. A large electronic-health-record-based genome-wide study of serum lipids. Nat. Genet. 50, 401–413 (2018).
Surakka, I. et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 47, 589–597 (2015).
Klarin, D. et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat. Genet. 50, 1514–1523 (2018).
Holmen, O. L. et al. Systematic evaluation of coding variation identifies a candidate causal variant in TM6SF2 influencing total cholesterol and myocardial infarction risk. Nat. Genet. 46, 345–351 (2014).
Asselbergs, F. W. et al. Large-scale gene-centric meta-analysis across 32 studies identifies multiple lipid loci. Am. J. Hum. Genet. 91, 823–838 (2012).
Albrechtsen, A. et al. Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes. Diabetologia 56, 298–310 (2013).
Saxena, R. et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007).
Iotchkova, V. et al. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps. Nat. Genet. 48, 1303–1312 (2016).
Tachmazidou, I. et al. A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates. Nat. Commun. 4, 2872 (2013).
Tang, C. S. et al. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese. Nat. Commun. 6, 10206 (2015).
van Leeuwen, E. M. et al. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels. Nat. Commun. 6, 6065 (2015).
Spracklen, C. N. et al. Association analyses of East Asian individuals and trans-ancestry analyses with European individuals reveal new loci associated with cholesterol and triglyceride levels. Hum. Mol. Genet. 26, 1770–1784 (2017).
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Buniello, A. et al. The NHGRI–EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Tishkoff, S. A. et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).
Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).
Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).
Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Guo, J. et al. Quantifying genetic heterogeneity between continental populations for human height and body mass index. Sci. Rep. 11, 5240 (2021).
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Majara, L. et al. Low generalizability of polygenic scores in African populations due to genetic and environmental diversity. Preprint at bioRxiv https://doi.org/10.1101/2021.01.12.426453 (2021).
Lehmann, B. C. L., Mackintosh, M., McVean, G. & Holmes, C. C. High trait variability in optimal polygenic prediction strategy within multiple-ancestry cohorts. Preprint at bioRxiv https://doi.org/10.1101/2021.01.15.426781 (2021).
Shi, H. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 12, 1098 (2021).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Cavazos, T. B. & Witte, J. S. Inclusion of variants discovered from diverse populations improves polygenic risk score transferability. HGG Adv. 2, 100017 (2021).
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
Bentley, A. R. et al. Multi-ancestry genome-wide gene–smoking interaction study of 387,272 individuals identifies new loci associated with serum lipids. Nat. Genet. 51, 636–648 (2019).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Kowalski, M. H. et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 15, e1008500 (2019).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Baigent, C. et al. Efficacy and safety of cholesterol-lowering treatment: prospective meta-analysis of data from 90 056 participants in 14 randomised trials of statins. Lancet 366, 1267–1278 (2005).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014).
Feng, S., Liu, D., Zhan, X., Wing, M. K. & Abecasis, G. R. RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 30, 2828–2829 (2014).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Loh, P.-R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).
Liu, X. et al. WGSA: an annotation pipeline for human genome sequencing studies. J. Med. Genet. 53, 111–112 (2016).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Liu, D. J. et al. Meta-analysis of gene-level tests for rare variant association. Nat. Genet. 46, 200–204 (2014).
Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
Finer, S. et al. Cohort Profile: East London Genes &Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people. Int. J. Epidemiol. 49, 20–21i (2019).
Moon, S. et al. The Korea Biobank Array: design and identification of coding variants associated with blood biochemical traits. Sci. Rep. 9, 1382 (2019).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Acknowledgements
Funding for the Global Lipids Genetics Consortium was provided by the NIH (R01-HL127564). This research was conducted using the UK Biobank Resource under application number 24460. Computing support and file management for central meta-analysis by S. Caron is acknowledged. This research is based on data from the MVP, Office of Research and Development, Veterans Health Administration, and was supported by awards 2I01BX003362-03A1 and 1I01BX004821-01A1. This publication does not represent the views of the Department of Veteran Affairs or the United States Government. Study-specific acknowledgements are provided in the Supplementary Information.
Author information
Authors and Affiliations
Consortia
Contributions
S.L.C., K.-H.H.W., S. Kanoni, G.J.M.Z. and S. Ramdas contributed equally to this work as co-second authors. All authors reviewed the manuscript. Consortium management: G.M.P., P.N., T.L.A., M.B., S.Kathiresan and C.J.W. Study design, interpretation of results and drafting of the manuscript: S.E.G., S.L.C., K.-H.H.W., S. Kanoni, G.J.M.Z., S. Ramdas, I.S., I.N., E.M., K.L.M., T.M.F., J.N.H., S. Kathiresan, M. Boehnke, P.N., G.M.P., C.D.B., A.P.M., Y.V.S., P.D., T.L.A. and C.J.W. Primary meta-analysis and QC: S.E.G., S. Vedantam, T.W.W. and A.E.L. PRS analysis and development: S.E.G., S.L.C., K.-H.H.W., S. Kanoni, M.Y.H., S.H., A.N., A. Choudhury, A.R.B., K.E., A.V., B.T., H.C.M., K.A.H., C.N.R., S.H., M.R., R.C.T., D.A.v.H., G.T., M.Y. and B.-J.K. Individual study genetic analysis: S.E.G., S. Kanoni, S. Vedantam, A.E.L., K.L.M., G.M.P., P.D., C.J.W., Q.H., D.K., X. Zhu, G.T., A. Helgadottir, D.F.G., H. Holm, I.O., M. Akiyama, S.S., C. Terao, M. Kanai, W. Zhou, B.M.B., H.R., S.E.R., A.S.H., Y.V., Q.F., E.A.R., T. Lingren, J.A.P., S.A.P., J. Haessler, F.G., Y.B., J.E.M., A. Campbell, K. Lin, I.Y.M., G. Hindy, A.R., J.D.F., W. Zhao, D.R.W., C. Turman, H. Huang, M. Graff, A. Mahajan, M.R.B., W. Zhang, K. Yu, E.M.S., A. Pandit, S.G., X.Y., J. Luan, J.-H. Zhao, F.M., H.-M.J., K. Yoon, C.M.-G., A. Pitsillides, J.J.H., G.W., A.R. Wood, Y.J., Z.G., S. Haworth, R.E.M., J.F.C., M. Aadahl, J.Yao, A. Manichaikul, H.R.W., J.R., J.B.-J., L.L.K., A.G., M.S.-L., R.N., C. Sidore, E.F., A.F.M., P.M.-V., M. Wielscher, S.T., N.S., L.T.M., B.H.T., M. Munz, L.Z., J. Huang, B.Y., A. Poveda, A.K., C. Lamina, L.F., M. Scholz, T.E.G., J.P.B., E.W.D., J.M.Z., J.S.M., C.F., H. Christensen, J.A.B., M.F.F., M.K.W., M. Preuss, M. Mangino, P.C., N.V., J.W. Benjamins, J. Engmann, R.L.K., R.C.S., K.S.L., N.R.Z., P.L., M.E.K., G.E.D., S. Huo, D.D.I., H.I., J. Yang, Jun Liu, H.L.L., J.M., B.S., M. Arendt, L.J.S., M.C.-G., C.W., M. Nakatochi, A.W., N.H.-K., X.Sim, R.X., A.H.-C., J.C.F.-L., V.L., M. Ahmed, A.U.J., N.A.Y., M.R.I., C. Oldmeadow, H.-N.K., S. Ryu, P.R.H.J.T., L.A., R.D., L.A.L., X.C., G. Prasad, L.L.-M., M. Pauper, J. Long, X. Li, E. Theusch, F.T., C.N.S., A. Loukola, S. Bollepalli, S.C.W., Y.X.W., W.B.W., T. Nutile, D. Ruggiero, Y.J.S., Y.-J. Hung, S.C., F. Liu, Jingyun Yang, K.A.K., M. Gorski, M. Brumat, K.M., L.F.B., J.A.S., P.H., A.-E.F., E.H., M. Lin, C.X., J. Zhang, M.P.C., S. Vaccargiu, P.J.v.d.M., N. Pitkänen, B.E.C., J. Lee, S.W.v.d.L., K.N.C., S.W., M.E.Z., J.Y.L., H.S.C., M. Nethander, S.F.-W., L.S., N.W.R., C.A.W., S.-Y.L., J.-S.W., C. Couture, L.-P.L., K.N., G.C.-P., H. Vestergaard, B.H., O.G., Q.C., M.O.O., J.v.S., Xiaoyin Li, K. Schwander, N.T., J.H.S., R.D.J., A.P.R., L.W.M., Z.C., L.Li, H.M.H., K.L.Y., T. Kawaguchi, J. Thiery, J.C.B., G.N.N., L.J.L., H.Li, M.A.N., O.T.R., S.I., S.H.W., C.P.N., H. Campbell, S.J., T. Nabika, F.A.-M., H.N., P.S.B., I.K., P. Kovacs, T.G., T. Katsuya, K.F.B., D.d.K., G.J.d.B., E.K.K., H.H.H.A., M.A.I., Xiaofeng Zhu, F.W.A., A.O.K., J.W.J.B., X.-O.S., L.S.R., O. Pedersen, T.H., P. Mitchell, A.W.H., M. Kkähönen, L.P., C. Bouchard, A.T., Y.-D.I.C., C.E.P., T.A.M., W.L., A. Franke, C. Ohlsson, D.M., Y.S.C., H. Lee, J.-M.Y., W.-P.K., S.Y.R., J.-T.W., I.M.H., K.J.S., H. Völzke, G. Homuth, M.K.E., A.B.Z., O. Polasek, G. Pasterkamp, I.E.H., S. Redline, K.P., A.J.O., H. Snieder, G.B., R.S., H. Schmidt, Y.E.C., S. Bandinelli, G. Dedoussis, T.A.T., S.L.R.K., N.K., M.B.S., G.G., B.J., C.A.B., P.K.J., D.A.B., P.L.D.J., X. Lu, V.M., M. Brown, M.J.C., P.B.M., X.G., M. Ciullo, J.B.J., N.J.S., J. Kaprio, P.P., L.S.A., S.A.B., H.J.d.S., A.R.W., R.M.K., J.-Y.W., W. Zheng, A.I.d.H., D.B., A. Correa, J.G.W., L. Lind, C.-K.H., A.E.N., Y.M.G., J.F.W., B.P., H.-L.K., J.A., R.J.S., D.C.R., D.K.A., S.C.H., M. Walker, H.A.K., G.R.C., C.S.Y., J.M.M., T.T.-L., C.A.-S., C.G.V., L.O., M.F., E.S.T., R.M.v.D., T. Lehtimäki, N.C., M.Y., Jianjun Liu, D.F.R., A.J.M., F. Kee, K.-H.J., M.I.M., C.N.A.P., V.V., C. Hayward, E.S., C.M.v.D., F. Lu, J.Q., H. Hishigaki, X. Lin, W.M., E.J.P., M. Cruz, V.G., J.-C.T., G.L., L.M.t.H., P.J.M.E., S.M.D., M. Kumari, M. Kivimaki, P.v.d.H., T.D.S., R.J.F.L., M.A.P., B.M.P., I.B., P.P.P., K. Christensen, S. Ripatti, E.W., H. Hakonarson, S.F.A.G., L.A.L.M.K., J.d.G., M. Loeffler, F. Kronenberg, D.G., J. Erdmann, H. Schunkert, P.W.F., A. Linneberg, J.W.J., A.V.K., M. Männikkö, M.-R.J., Z.K., F.C., D.O.M.-K., K.W.v.D., H.W., D.P.S., N.G., P.S., N. Poulter, J.I.R., T.M.D., F. Karpe, M.J.N., N.J.T., C.-Y.C., T.-Y.W., C.C.K., C. Sabanayagam, A. Peters, C.G., A.T.H., N.L.P., P.K.E.M., D.I.B., E.J.C.d.G., L.A.C., J.B.J.v.M., M. Ghanbari, P.G.-L., W.H., Y.J.K., Y.T., N.J.W., C. Langenberg, E.Z., J. Kuusisto, M. Laakso, E.I., G.A., J.C.C., J.S.K., P.S.d.V., A.C.M., K.E.N., M.D., P. Kraft, N.G.M., J.B.W., S.A., D.S., R.G.W., M.V.H., C.Black, B.H.S., A.E.J., A.B., J.E.B., P.M.R., D.I.C., C. Kooperberg, W.-Q.W., G.P.J., B.N., M.G.H., M.D.R., P.J., V.S., K.H., B.O.A., M. Kubo, Y. Kamatani, Y.O., Y.M., U.T., K. Stefansson, Y.-L.H., J.A.L., D. Rader, P.S.T., K.-M.C., K. Cho, C.J.O., J.M.G., and P.W.
Corresponding authors
Ethics declarations
Competing interests
G.J.M.Z. is an employee of Incyte Corporation. G.C.-P. is currently an employee of 23andMe. M.J.C. is the Chief Scientist for Genomics England, a UK Government company. B.M.P. serves on the steering committee of the Yale Open Data Access Project funded by Johnson & Johnson. G.T., A. Helgadottir, D.F.G., H. Holm., U.T. and K. Stefansson are employees of deCODE/Amgen. V.S. has received honoraria for consultations from Novo Nordisk and Sanofi and has an ongoing research collaboration with Bayer. M. McCarthy has served on advisory panels for Pfizer, NovoNordisk and Zoe Global, has received honoraria from Merck, Pfizer, Novo Nordisk and Eli Lilly, and research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier and Takeda. M. McCarthy and A. Mahajan are employees of Genentech and are holders of Roche stock. M. Scholz receives funding from Pfizer for a project unrelated to this work. M.E.K. is employed by Synlab. W.M. has received grants from Siemens Healthineers, grants and personal fees from Aegerion Pharmaceuticals, grants and personal fees from Amgen, grants from AstraZeneca, grants and personal fees from Sanofi, grants and personal fees from Alexion Pharmaceuticals, grants and personal fees from BASF, grants and personal fees from Abbott Diagnostics, grants and personal fees from Numares, grants and personal fees from Berlin-Chemie, grants and personal fees from Akzea Therapeutics, grants from Bayer Vital, grants from bestbion dx, grants from Boehringer Ingelheim, grants from Immundiagnostik, grants from Merck Chemicals, grants from MSD Sharp and Dohme, grants from Novartis Pharma, grants from Olink Proteomics, and is employed by Synlab Holding Deutschland, all outside the submitted work. A.V.K. has served as a consultant to Sanofi, Medicines Company, Maze Pharmaceuticals, Navitor Pharmaceuticals, Verve Therapeutics, Amgen and Color Genomics; received speaking fees from Illumina, the Novartis Institute for Biomedical Research; received sponsored research agreements from the Novartis Institute for Biomedical Research and IBM Research; and reports a patent related to a genetic risk predictor (20190017119). S.K. is an employee of Verve Therapeutics and holds equity in Verve Therapeutics, Maze Therapeutics, Catabasis and San Therapeutics. He is a member of the scientific advisory boards for Regeneron Genetics Center and Corvidia Therapeutics; he has served as a consultant for Acceleron, Eli Lilly, Novartis, Merck, Novo Nordisk, Novo Ventures, Ionis, Alnylam, Aegerion, Haug Partners, Noble Insights, Leerink Partners, Bayer Healthcare, Illumina, Color Genomics, MedGenome, Quest and Medscape; and reports patents related to a method of identifying and treating a person having a predisposition to or afflicted with cardiometabolic disease (20180010185) and a genetics risk predictor (20190017119). D.K. accepts consulting fees from Regeneron Pharmaceuticals. D.O.M.-K. is a part-time clinical research consultant for Metabolon. D.S. has received support from the British Heart Foundation, Pfizer, Regeneron, Genentech and Eli Lilly pharmaceuticals. The spouse of C.J.W. is employed by Regeneron.
Additional information
Peer review information Nature thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Effect sizes of identified index variants from multi-ancestry meta-analysis.
Index variants associated with a) HDL cholesterol, b) LDL cholesterol, c) triglycerides, d) nonHDL cholesterol and e) total cholesterol include both common variants of small to moderate effect and low frequency variants of moderate to large effect.
Extended Data Fig. 2 Comparison of the number of index variants by sample size.
a) Comparison of the number of index variants reaching genome-wide significance (p < 5x10−8) from meta-analysis of LDL-C in each ancestry group. A meta-analysis of five random subsets of European cohorts selected to reach sample sizes of approximately 100,000, 200,000, 400,000, 600,000, or 800,000 individuals is also shown. b) Comparison of chi-squared values from meta-analysis of LDL-C for each possible combination of ancestry groups (without genomic-control correction) for variants with minor allele frequency (MAF) ≥ 5%. The colored lines indicate a linear regression model of all meta-analyses for a specific ancestry (eg. all analyses including European individuals). c) Comparison of chi-squared values from meta-analysis of LDL-C for variants with MAF ≤ 5%. d) Comparison of chi-squared valued for variants with MAF ≥ 5% for LDL-C without genomic-control correction in a meta-analysis of all European cohorts as well as five subsets selected to reach sample sizes of approximately 100,000, 200,000, 400,000, 600,000, or 800,000 individuals.
Extended Data Fig. 3 Effect sizes by ancestry for unique index variants from ancestry-specific meta-analysis.
Comparison of effect sizes and standard errors for the 389 unique variants reaching genome-wide significance (p-value < 5x10−8 as given by RAREMETAL) in two ancestry groups. Variants with discordant directions of effect between ancestries are labeled by chromosome and position (build 37). Association results for all index variants are given in Supplementary Table 3. The red line depicts an equivalent European ancestry and non-European ancestry effect size while the black line depicts a linear regression model. R2 = 0.93.
Extended Data Fig. 4 Comparison of credible set size.
The number of variants in the 99% credible sets for each association signal are compared between a) admixed African ancestry and multi-ancestry analysis and b) European ancestry and multi-ancestry analysis.
Extended Data Fig. 5 Overview of LDL-C polygenic score generation and validation.
Polygenic scores were calculated separately in each ancestry group (or in all ancestries) using either pruning and thresholding or PRS-CS. The polygenic scores were then taken forward for testing in ancestry-matched participants followed by validation in independent data sets.
Extended Data Fig. 6 Optimal polygenic score threshold by ancestry group for either PRS-CS or pruning and thresholding based LDL-C polygenic scores.
Adjusted R2 estimated upon testing in UK Biobank ancestry-matched participants (who were not included in GWAS summary statistics). a) Admixed African, East Asian and South Asian ancestry polygenic scores. b) European and multi-ancestry polygenic scores. c) European ancestry (GLGC 2010) and multi-ancestry polygenic scores. d) All polygenic scores across all thresholds used for score construction. e) Comparison of adjusted R2 across ancestry groups relative to the maximum for covariates alone, polygenic scores from PRS-CS or polygenic scores from pruning and thresholding.
Extended Data Fig. 7 Improvement in PRS performance in African Americans when starting with ancestry-mismatched European ancestry scores by updating weights, updating variant lists, or updating both variants and weights to be ancestry-matched.
By comparison to the gold-standard performance of the multi-ancestry-derived PRS in African Americans (adjusted R2 = 0.12), a European ancestry derived score capture only 47% of the variance explained by the multi-ancestry PRS. When LD and association information from the target population is used to optimize the list of variants for inclusion in the PRS, but with ancestry-mismatched weights from European ancestry GWAS, the variance explained reaches 71% of the gold standard. If the PRS variant list selected in European ancestry individuals were genotyped in the target population, and PRS weights were updated using a GWAS from the target population, the variance explained reached 87% of the gold standard. Finally, deriving both the marker list and weights from the target population (single-ancestry GWAS of admixed African individuals) explained 94% of the variance relative to the gold-standard trans-ancestry PRS.
Extended Data Fig. 8 Comparison of PRS performance by admixture quartile.
We divided the testing cohorts into quartiles by proportion of African ancestry and estimated the performance of the PRS separately within each quartile in a) the Michigan Genomics Initiative (N = 1,341), and b) the Million Veteran Program (N = 18,251). Error bars represent 95% confidence intervals.
Supplementary information
Supplementary Information
This file contains acknowledgements for each cohort, VA Million Veteran Program and Global Lipids Genetics Consortium authors, Supplementary Tables 2, 4, 8, 13 and 21–23, Supplementary Figs. 1–10, the Supplementary Notes and Supplementary Methods.
Supplementary Tables
This file contains Supplementary Tables 1, 3, 5–7, 9–12 and 14–20.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Graham, S.E., Clarke, S.L., Wu, KH.H. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021). https://doi.org/10.1038/s41586-021-04064-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-021-04064-3
- Springer Nature Limited
This article is cited by
-
Genetically predicted gut microbiota mediate the association between plasma lipidomics and primary sclerosing cholangitis
BMC Gastroenterology (2024)
-
Bidirectional causality between the levels of blood lipids and endometriosis: a two-sample mendelian randomization study
BMC Women's Health (2024)
-
Novel genetic markers for chronic kidney disease in a geographically isolated population of Indigenous Australians: Individual and multiple phenotype genome-wide association study
Genome Medicine (2024)
-
Appraising associations between signature lipidomic biomarkers and digestive system cancer risk: novel evidences from a prospective cohort study of UK Biobank and Mendelian randomization analyses
Lipids in Health and Disease (2024)
-
A genomics perspective of personalized prevention and management of obesity
Human Genomics (2024)