Introduction

Type 2 diabetes (T2D) is a progressive, common metabolic disorder that results in the inability to regulate blood glucose in the body. Individuals with T2D are primarily characterized by inefficient use of insulin to regulate glucose in peripheral tissues such as muscles, liver, and fats, causing additional stress on the pancreas to produce more insulin. East Asian ancestry individuals develop T2D at lower body mass index (BMI) compared to European ancestry individuals, likely attributable to higher fat deposition at the abdominal area [1, 2]. It is unclear if the higher proportion and location of body fat storage in East Asian individuals or effect modification with environmental and/or genetic factors account for these differences [3•].

Large-scale genome-wide association studies (GWAS) have been very successful in identifying over 500 genomic regions associated with T2D, explaining up to 20% of the disease heritability [4,5,6,7]. Historically, T2D GWAS and meta-analyses, like other complex traits, were predominantly undertaken in European ancestry populations, as a substantial number of samples were required to discover and replicate these findings reliably. As more genomic data have become available in non-European ancestry populations, particularly in populations of East Asian ancestry [6••, 8•, 9], the emerging pattern of shared variant-T2D associations across multiple ancestries supports the model of shared underlying causal genetic variants before out-of-Africa migration. However, additional loci have also been identified in non-European ancestry populations, primarily as a result of increased analytic power from higher allele frequencies, but also from heterogeneous effect sizes across populations and possible gene-by-environment and gene-by-gene interactions. Additionally, differences in linkage disequilibrium across populations can be leveraged at loci shared across populations to narrow association signals and identify candidate causal variants underlying the associations. Full characterization of the genetics of T2D in different populations can identify the allelic spectrum of genetic variation in all ancestries and elevate established T2D-associated loci toward effective therapeutic targets.

In this review, we will summarize the progress in identifying T2D genetic associations in populations of East Asian ancestry and how these studies have advanced the current repertoire of T2D genetics beyond performing larger studies in European ancestry populations. In addition, we discuss the need for additional biologic and bioinformatic resources in non-European ancestry populations to enhance our functional understanding and improve our interpretation of the results.

Discovery of Genetic Loci in East Asian Ancestry Individuals from 2008 to 2020

T2D risk is strongly influenced by multiple genetic loci. Most of the T2D-associated loci were initially discovered in populations of European ancestry [10] as early as 2007 with three population-based T2D GWAS published back-to-back in Science [11,12,13]. Subsequent efforts to discover additional T2D loci in European ancestry populations involved large-scale meta-analyses of individual studies [14,15,16], the most recent of which was published in 2018 and identified 403 distinct signals at 243 loci in nearly 900,000 individuals [5]. As T2D genetic association data emerged in non-European ancestry populations, trans-ancestry meta-analyses also revealed more shared susceptibility between global populations [4, 7, 17, 18].

The earliest T2D GWAS performed in individuals of East Asian ancestry were two parallel papers led by independent Japanese groups published simultaneously in Nature Genetics in 2008 [19, 20]. Using data from genotype arrays, both studies independently identified novel T2D associations at/near the gene KCNQ1, a potassium voltage-gated channel subfamily Q member gene, on chromosome 11 that had not yet been described in any of the European ancestry analyses. Yasuda et al. [19] reported common intronic variant rs2237892 to be associated with T2D. Unoki et al. [20] simultaneously described three additional common variants at the same locus (rs2283228, rs2237897, and rs2237895) in moderate to high LD with rs2237892 (r [2] in 1000G Phase 3 East Asians: 0.23 to 0.84). The minor allele frequency for all four identified variants, which were identified in analyses of <10,000 T2D cases, occur at a much lower frequency in European ancestry populations than East Asian ancestry populations (e.g., rs2237892: EAS minor allele frequency [MAF]=35.5%, EUR MAF=6.3%; Table 1). Associations near KCNQ1 were not reported in European ancestry analyses until 2010 when the number of T2D cases included in meta-analyses increased to more than 42,000 [14]. The European lead variant, rs231362, maps 150kb away from the lead variants in East Asian ancestry individuals [14] [LD between rs2237892 and rs231362: EAS r [2] = 0.0031; EUR r [2] = 0.0001]. KCNQ1 remains the strongest T2D-associated locus in East Asian ancestry populations to date, with more than 17 distinct signals identified in the latest East Asian T2D meta-analysis [6] [rs2237897 (risk allele frequency, RAF = 37%), OR= 1.31 (95% CI 1.29 – 1.33), P-value of 2.0×10−246]. Following the two initial Japanese publications in 2008, three separate East Asian GWAS published in 2010 described six additional novel T2D-associated loci, including SRR [21], PTPRD [21], UBE2E2 [22], C2CD4A/B [22], and SPRY2 [23]. Associations at/near UBE2E2, C2CD4A/B, and SPRY2 were subsequently replicated in European ancestry analyses [5].

Table 1 Summary of T2D genetic association studies in East Asian individuals

The first large-scale T2D genome-wide meta-analysis in East Asian ancestry populations assembled 6952 cases and 11,865 controls in the discovery phase from China, Japan, Korea, the Philippines, Singapore, and Taiwan, as part of the Asian Genetic Epidemiology Network (AGEN) [8]. Utilizing the HapMap2 CHB+JPT [24] imputation reference panel, the final meta-analysis sample size after in silico and de novo replications included 25,079 T2D cases and 29,611 controls and reported a total of eight novel loci at/near MAEA, GLIS3, FITM2/R3HDML/HNF4A, GCC1/PAX4, PSMD6, ZFAND3, PEPD, and KCNK16. The strongest novel association was a common variant at the MAEA locus with a risk allele frequency of 58% in East Asian ancestry populations but only 3% in European ancestry populations, further demonstrating the ease of discovery and improved power due to differences in risk allele frequencies. Of the eight novel loci, three were previously implicated in type 1 diabetes (T1D; GLIS3) [25] or maturity onset diabetes of the young (MODY; HNF4A [26] and PAX4 [27, 28]). GLIS3 plays an important role in human beta cell development and function and is associated with both T1D [25] and T2D [29] with consistent directions of effect. At the GCC1/PAX4 locus on chromosome 7, the association signal was near a cluster of genes, including GCC1, containing a GRIP domain, and PAX4 [27, 28], a transcription factor known to play a role in pancreatic islet development. Conditional and sensitivity analyses indicate that these association signals are independent of previously known MODY variants and not likely to be attributable to misclassification of T2D cases. The lead variant at the KCNK16 locus was a common intronic variant near the GLP1R gene that encodes a receptor for glucagon-like peptide 1 (GLP1), a pancreatic beta-cell receptor involved in blood sugar homeostasis and insulin secretion. GLP1R is also a target of GLP-1 receptor agonists, a class of medications used for the treatment of T2D. Less is known about the function and role of MAEA with regards to T2D pathogenesis, but PSMD6, ZFAND3, and PEPD are all related to pancreatic beta-cell function and/or insulin secretion [30,31,32]. As for GWAS of other common diseases, most of the T2D-associated lead variants identified in either European or East Asian ancestry analyses are located in the non-coding regions of the genome, resulting in a more difficult path to identifying candidate causal variants and genes and/or disease biology.

Advances in DNA sequencing technologies and the availability of denser reference panels for imputation are two important developments in human genetics that, along with increasing sample sizes, have allowed for refinement of existing association signals and discovery of new association signals. With the first release of the 1000 Genomes Project Phase 1 reference panel and increase in sample sizes [33], at least 10 additional T2D-associated loci were reported in Japanese populations [34, 35]. In 2013, Hara et al. [34] reported three novel loci at MIR219-LEP (within 1Mb from PAX4), GPSM1, and SLC16A13 in a discovery cohort of 5976 Japanese T2D cases and 20,829 controls [34]. LEP encodes leptin, a hormone that regulates appetite [36], and increased leptin levels are associated with obesity and T2D [37]. At the SLC16A13 locus, the strongest association was at a common intronic variant, rs312457. Just 5 kb away, a meta-analysis of Mexican and other Latin American ancestry populations reported a T2D association at a common synonymous variant, rs13342232, at SLC16A11 [38]. High LD between the two lead variants (1000G EAS r [2]=0.93 and AMR r [2]=0.80) suggests the two ancestries share the same association signal. A risk haplotype at SLC16A11 accounting for ~20% of the increased prevalence in T2D in Mexican American individuals [38] and further functional experiments [39] suggest SLC16A11 as the more likely causal candidate gene at the locus. In a separate analysis, the discovery samples from Hara et al. [34] were subsequently combined with an additional 9817 Japanese T2D cases and 6763 controls, bringing the discovery meta-analysis to >15,000 T2D cases and >26,000 controls [35]. An additional seven new loci were reported, including FAM60A (now known as SINHCAF), a master transcriptional repressor that regulates a response to hypoxia and influences an in vitro model of angiogenesis [40], and INS/IGF2/TH, which is independent of KCNQ1 (INS/IGF2 located ~620kb upstream).

The T2D-GENES consortium (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) focused on the protein-coding sequence in five major ancestry groups (African American, East Asian, European, Hispanic, and South Asian) [41•, 42]. Association analyses were performed for each ancestry, and ancestry-specific association results were combined into a trans-ancestry meta-analysis. Across all single-ancestry analyses, only one missense variant reached genome-wide significance: Arg192His within PAX4 identified in 2184 East Asian ancestry individuals. The lead variant, rs2233580, is common in individuals of East Asian ancestry (MAF ~10%) but almost absent in other ancestries [43]. The rs2233580 variant was not well imputed and not in LD with the lead T2D variant at the previously reported GCC1/PAX4 locus from the Cho et al. [8] AGEN meta-analysis (1000G EAS r [2]=0.028). The association between rs2233580 and T2D was subsequently validated in other East Asian studies using either the exome array [44] or exome sequencing [45]. To date, this missense PAX4 variant remains associated with T2D only in East Asian ancestry populations as they are monomorphic in other ancestry populations. Though PAX4 has been implicated in MODY, the consistent and replicable associations in multiple East Asian studies and lack of association with age of diagnosis suggest multiple variants at/near PAX4 that are likely playing a role in the pathogenesis of T2D among East Asian ancestry individuals [41•, 45].

As large biobanks mature in East Asia, available sample sizes for genome-wide meta-analyses are greatly surpassing previous efforts. In 2019, a Japanese T2D meta-analysis combined 36,614 T2D cases and 155,150 controls from Biobank Japan and identified 88 T2D-associated loci, of which 28 were novel [9]. Lead variants at 68 loci were common in both the Japanese population and European ancestry samples included in the 1000G Phase 3 reference panel. The meta-analysis also identified 28 missense variants within 21 genes that were in high LD (EAS r [2]≥0.80) with the lead T2D variants. Among the 28 missense variants, at least 8 missense variants were less frequent (0.01<MAF<0.05) in Japanese populations and rare (MAF<0.01) or monomorphic in European ancestry individuals. For instance, at one of the novel loci, SCTR, rs3731600 (p.Ala122Pro; MAF=0.05) is polymorphic only in individuals of East Asian ancestry. In a second example, a missense variant rs3765467 (p.Arg131Gln), at the previously reported fasting glucose-associated locus GLP1R [46, 47], is common in East Asian ancestry individuals (MAF=0.23) but is extremely rare in European ancestry populations (MAF=0.001) [48]. The GLP1R association signal is independent of associated variants at the neighboring KCNK16 locus [8], and the protective allele is associated with increased GLP1-induced insulin secretion [49]. Trans-ancestry comparisons of molecular pathway analyses highlight shared (e.g., MODY and beta cell) and heterogenous (e.g., NOTCH signaling and insulin secretion) molecular pathways in Japanese and European ancestry populations.

The advent of biobanks in China (China Kadoorie Biobank CKB) [50], Japan (BioBank Japan BBJ) [9], and South Korea (Korea Biobank Array, KBA) [51], along with continually denser reference panels (e.g., 1000G Phase 3) allowed for continued improvement in revealing the genetic architecture of T2D in East Asian ancestry populations. Most recently, the AGEN consortium performed a T2D genome-wide meta-analysis with data imputed to the denser and larger 1000 Genomes Phase 3 reference panel [48] and included up to 433,530 individuals (77,418 T2D cases and 356,122 controls) from 23 studies across six East Asian countries [6]. Association models were performed with and without adjustment for BMI to account for obesity, one of the biggest risk factors for T2D. A total of 301 T2D association signals at 183 loci were identified, of which 61 were novel for any ancestry after accounting for previously published loci [5, 9]. Association effect sizes were also highly correlated between East Asian and European ancestry individuals [5] at T2D-associated loci, supporting evidence of substantial shared T2D susceptibility between the two ancestries. Loci with large differences in effect sizes between the two ancestries are generally rare (MAF ≤0.1%) in European ancestry populations but common (e.g., PAX4, RANBP3L) or less frequent (e.g., ZNF257, DGKD) in East Asian ancestry populations (Table 1). In T2D association analyses adjusted for BMI, we identified four additional novel loci, including NID2, which exhibits other associations consistent with a lipodystrophy phenotype of increased T2D risk but lower BMI and higher triglycerides in East Asian ancestry individuals [52, 53]. The substantial increase in sample size also allowed for sex-stratified analyses, revealing a male-specific T2D association at the ALDH2 locus with the strongest evidence of heterogeneity between sexes. Furthermore, conditional analyses established independence of multiple signals at many of the T2D-associated loci, for example, seven independent signals at the PAX4 locus including the adjacent LEP signal first reported in Hara et al. [34], and two independent signals mapping to GLP1R and KCNK16 on chromosome 6.

Biologic and Mechanistic Insights into T2D Pathophysiology Identified from East Asian Analyses

GWAS and meta-analyses have been an efficient tool for identifying loci associated with T2D in all populations, elucidating novel mechanisms for disease development and effective therapeutic targets. Even in the early days of GWAS, analyses in East Asian ancestry populations identified loci not observed in European ancestry populations that yielded new biological insights into T2D risk. We highlight five examples below.

Locus with Multiple Association Signals with Effects Mediated Through Multiple Tissues (Shared Susceptibility Across Ancestries)

A variant within the ANK1 locus, rs515071, was initially identified as a T2D-associated locus in a combined meta-analysis of East Asian and European ancestry individuals in 2012 [54]. Associated variants in and near ANK1 and nearby NKX6-3 were subsequently replicated in numerous meta-analyses of single- and trans-ancestry populations [5, 9, 15, 17, 22, 35], including multiple distinct association signals [5, 9]. Skeletal muscle cis-eQTL analyses initially identified ANK1 as the effector transcript at this locus (EUR lead GWAS variant: rs508419) [55]; this result was also observed in subcutaneous adipose tissue [56]. Conversely, Mahajan et al. [5] used credible set and pancreatic islet cis-eQTL results to suggest NXK6-3 as the effector transcript at this locus (EUR lead GWAS variant: rs13262861). However, in the most recent meta-analysis of East Asian ancestry individuals, two of three distinct association signals (also identified in European ancestry individuals [5]) were found to co-localize with cis-eQTLs for NKX6-3 in pancreatic islets [57] (EAS lead GWAS variant: rs33981001) and ANK1 in both subcutaneous adipose tissue [56] and skeletal muscle [55] (EAS lead GWAS variant: rs62508166) (Table 2), providing evidence that variants at this locus are acting on different genes in different tissues [6].

Table 2 Colocalization between expression quantitative trait loci and loci with evidence of association with T2D in East Asian individuals

Loci with Variants Present at Higher Frequencies in East Asian Individuals Compared to Non-East Asian Ancestry Populations

At the ZNF257 locus, rs142395395 was recently reported to be associated with T2D in East Asian ancestry individuals (OR 1.24, 95% CI 1.19–1.29) [6, 9]. With a MAF around 4%, this variant tags a previously described inversion of 415 kb that disrupts the coding sequence and expression of ZNF257, as well as lymphoblastoid expression of 81 downstream genes and transcripts [58]. However, rs142395395 has only been reported twice in 29,828 non-East Asian ancestry individuals from the gnomAD database [43]. While the exact biologic mechanism and/or effector gene underlying the locus is still to be determined, disrupting the expression of a transcription factor with a substantial number of downstream targets could impact T2D risk in multiple ways.

Variants at/near the PAX4 locus have consistently demonstrated association with T2D in East Asian ancestry populations [6, 8, 9, 34, 35]. This locus was originally identified as a T2D susceptibility gene in a candidate gene study of Japanese individuals [59]. Recent GWAS meta-analyses in individuals of East Asian ancestry expanded the locus to include multiple signals both upstream, near GCC1 [6, 8, 9], and downstream, near LEP [6, 9, 34, 35], of PAX4. In the largest East Asian ancestry T2D GWAS meta-analyses, two missense variants in codon 192 of PAX4, rs2233580 (Arg192His; RAF=8.6%; OR 1.31, 95% CI 1.28–1.34; P=3.4×10−93) and rs3824004 (Arg192Ser; RAF=3.4%; OR 1.24, 95% CI 1.19–1.28; P=1.1×10−30), are the strongest association signals in the region [6, 9, 45]. Both variants are extremely rare in non-East Asian ancestry populations [60]. PAX4 is a pancreatic islet transcription factor that is known to play a critical role in differentiating embryonic pancreatic progenitor cells into beta-cells [61]. Additionally, consequences of targeted disruption of pax4 in mice include a fatal absence of mature b-cells that results in severe insulin deficient diabetes and death within the first few days of life [62]. Downstream of PAX4, LEP encodes the leptin protein, a hormone secreted by white adipose tissue [63] that plays a key role in glucose metabolism [64]. Mice with leptin deficiencies are characterized by both insulin resistance and diabetes [65], and epidemiological studies have consistently demonstrated leptin to be associated with risk for T2D [66, 67]. It is currently unclear if the different association signals are acting independently or in tandem, but both PAX4 and LEP are strong candidates for effector genes at this East Asian–specific locus.

The associations between variants near SIX3/SIX2 and T2D and related glycemic traits were first reported in East Asian ancestry populations. The variant rs895636 (risk allele frequency (RAF) =42.8%) was reported to be associated with fasting plasma glucose, a T2D-related trait, in a GWAS of >17,000 Korean and Japanese individuals [68] and subsequently replicated in other East Asian ancestry GWAS and meta-analyses [69, 70]. Additionally, a variant in strong LD, rs12712928 (r [2]=0.91; RAF=40.2%) was significantly associated with HbA1c, another T2D-related trait, in a recent glycemic trait meta-analysis of East Asian ancestry individuals [69]. Furthermore, rs12712928 was recently identified in the Spracklen et al. [6] AGEN T2D meta-analysis. However, despite the common allele frequency of rs12712928 in other ancestry populations (European RAF=13.3%, African American and Hispanic RAF=24.2%, South Asian RAF=26.7%), no associations have yet been observed between variants near SIX3/SIX2 and T2D in non-East Asian ancestry populations [4, 5, 7, 69]. SIX3 and SIX2 encode pancreatic islet transcription factors that are known to play roles in islet beta-cell functions [71] by coordinately regulating the expression of target genes that govern mature beta-cell function (e.g., insulin production) and maintain beta-cell fate [72].

Locus with Environmental Interactions

The ALDH2 locus spans almost 2Mb, reflecting a recently selected sweep in individuals of East Asian ancestry where carriers of the alternate allele experience discomfort such as flushing, nausea, and headache following alcohol consumption [73]. The causal variant in ALDH2 that alters alcohol metabolism, rs671 (Glu504Lys), is polymorphic only in East Asian ancestry individuals and has also been associated with cardiometabolic traits including blood pressure, BMI, HDL-cholesterol, LDL cholesterol, and cardiovascular disease risk [74,75,76]. Furthermore, a Mendelian randomization study in Chinese individuals suggests evidence of a non-causal effect of moderate alcohol drinking on the risk of myocardial infarction [77]. The association with T2D at this locus is heterogeneous between sexes, highly significant only in males (P=5.8×10−27) and not in females (P=0.19) [6]. The strong sex-heterogeneity observed suggests possible gene-environment interactions with alcohol consumption patterns between sexes or alcohol effects on BMI and/or insulin sensitivity [78]. The highly pleiotropic nature of rs671 and important detoxification role of alcohol dehydrogenases suggest ALDH2 as a possible therapeutic target for T2D and other cardiometabolic diseases [79].

Identifying Biologic Mechanisms at Loci Found in Non-European Ancestry Populations

While the process for identifying functional variants, effector transcripts, and directions of effect can be complex for loci identified within any ancestry, it can be especially difficult for non-European ancestry populations for several reasons. First, while bioinformatic resources exist for determining the variant effect on biological pathways (e.g., PASCAL, DEPICT) [80, 81], the base programs rely on linkage disequilibrium (LD) calculations derived from a European ancestry reference population, and users are required to input custom reference genotype information for other ancestries. Second, target genes can be predicted by colocalizing GWAS variants with gene expression (eQTL) in disease-relevant tissues, DNA methylation that regulates gene expression (mQTL), or protein abundance (pQTL) data. However, the majority of these omics data generated to date are in populations of European ancestry, which can exhibit limited transferability to other ancestry populations when LD structure and/or allele frequencies are heterogenous.

As an illustrative example, we interrogated gene expression datasets for hypothalamus, liver, pancreas, pancreatic islets, peripheral blood, skeletal muscle, subcutaneous and visceral adipose, and whole blood tissues for colocalization (LD r [2] between the lead GWAS and lead eQTL variant >0.80 in European ancestry populations as eQTL data were generated from European ancestry individuals) at all 301 AGEN T2D association signals and lead eQTL variants (Table 2) [56, 82,83,84,85,86]. Overall, 60 (19.9%) of the T2D GWAS variants colocalized with at least one gene in at least one tissue. However, 43 (14.3%) of the variants were not tested for association with gene expression in any of the datasets used for colocalization, likely because these variants failed quality metrics or MAF thresholds in the European ancestry eQTL studies due to allele frequency differences between the two ancestral populations (more frequent in EAS but less frequent or rare in EUR). Similarly, with the Genotype-Tissue Expression Project (GTEx) [87], the most widely used eQTL dataset in the field, we were unable to test 95 (31.6%) of the variants identified in the AGEN T2D analysis. The colocalization analyses suggest that T2D in East Asian ancestry individuals appears to be more related to insulin resistance (e.g., adipose tissue, skeletal muscle) mechanisms rather than insulin secretion (pancreas). To enable translation of more of the genetic loci identified in East Asian ancestry populations into disease insights and therapeutic targets, additional non-European ancestry bioinformatic and genomic resources need to be generated and available for use.

Conclusion

In summary, GWAS across multiple population groups have identified more than 500 T2D-associated loci over the past 15 years, providing mechanistic insights into T2D pathophysiology. While analyses in European ancestry populations now include >1 million individuals [7], analyses in non-European ancestry individuals, including East Asian populations, continue to identify unique genetic signals due to allele frequency differences. Unfortunately, the differences in allele frequencies across ancestry groups also indicate that European ancestry–based resources for identifying associated variants (e.g., reference panels), effector transcripts (e.g., eQTLs, meQTLs, pQTLs), and biological pathways are insufficient for non-European ancestry populations. There is a clear need for denser reference panels comprised of deeply sequenced East Asian ancestry individuals to refine association signals and identify rare variant associations. Additionally, other omics data and bioinformatics resources (e.g., transcriptomics, epigenomics, proteomics, metabolomics) generated in East Asian ancestry individuals are necessary to further translate genetic findings to effective therapeutic targets.