Introduction

Type 2 diabetes (T2DM) affects over 400 million people worldwide and is one of the major challenges in modern health [1]. Genetic association studies have identified over 100 loci that influence T2DM risk, and recent studies have argued that many additional risk loci remain to be discovered [2••]. These loci predominantly have small to moderate effects on individual predisposition to T2DM, complicating the clinical translation of genetic information directly. Each locus, however, contains molecular and cellular mechanisms through which risk variants functionally contribute to disease. Deciphering these mechanisms is paramount to a greater understanding of T2DM pathogenesis and can potentially inform novel avenues for disease prevention, therapies and treatments.

Evaluation of T2DM risk locus mechanisms involves a variety of complementary approaches that include dense T2DM genetic data, physiological trait data, genome and epigenome annotation, quantitative trait locus (QTL) mapping and cellular and animal modelling. Advances in genetics, genomics and cellular modelling technology have enabled more precise identification of risk locus mechanisms using these approaches. Here, I discuss recent advances in applying these and other approaches to identify mechanisms of T2DM risk as well as considerations for future studies.

Identification of Causal Variants at T2DM Loci

Risk loci identified in genetic studies of T2DM often contain many associated variants due to linkage disequilibrium between variants in the human genome [2••, 3, 4]. Genetic studies traditionally report the most significantly associated variant at a given locus, or ‘index’ variant [3, 5], although this variant is not necessarily causal for T2DM. The majority of associated variants at a given locus are not causal and only correlate with the true causal variant(s) because of linkage disequilibrium. Resolving which specific variants are causal at a risk locus is critical to subsequently understanding molecular functions of these loci.

Genetic fine mapping is a common strategy for resolving causal variants, whereby a comprehensive set of genetic variants across a locus is genotyped or imputed and evaluated in large sample sets [6]. A study of genotyping data from the Metabochip microarray fine-mapped 39 known T2DM loci using Bayesian methods [7], and at the MTNR1B locus resolved a single variant, rs10830963, with >99% probability of being causal for T2DM risk [7]. Whole-genome sequencing of T2DM samples combined with imputation into GWAS data from 55 k samples fine-mapped 81 known T2DM loci and at several loci identified candidate variants with high causal probabilities that were not assayed in previous studies [2••]. One useful strategy to further improve resolution of causal variants through fine mapping is using genetic data from different ethnicities, assuming a causal variant is shared across ethnic backgrounds [8]. A study of T2DM using data of samples from European, Hispanic, East Asian and South Asian ancestry fine-mapped 10 known T2DM signals and reduced the set of variants likely causal for these signals [8].

Dense genetic data also facilitates the identification of additional risk variants at loci that have distinct effects on T2DM risk [2••, 7]. For example, in a T2DM fine-mapping study of 39 loci, conditional analyses identified 49 total distinct common variant risk signals including 5 signals at the KCNQ1 locus, 3 signals at the HNF1A locus and 2 signals at several additional loci [7]. Common risk loci also harbour lower frequency risk variants, such as at the CCND2, PPARG and IRS1 loci identified in genome sequencing of T2DM samples [2••, 9]. Moving forward, in addition to genome sequencing, studies imputing genotype data from T2DM cohorts and Biobanks into large reference panels will facilitate continued fine-mapping of causal variants [10, 11].

Physiological Association of T2DM Variants

Genetic association data of quantitative measures relevant to T2DM pathogenesis can provide key insight into the mechanisms of a diabetes risk locus.

The canonical example of physiology at T2DM signals is the FTO locus, which has a primary effect on body mass index (BMI) and obesity [12]. Studies have more broadly determined the effect of T2DM risk variants on measures of fasting glycemia such as glucose and insulin levels [13, 14]. Homeostasis model assessment (HOMA) derived from these measures provides likely effects on pancreatic beta cell function and insulin secretion (HOMA-B) and insulin resistance (HOMA-IR) [5]. Cataloguing T2DM loci based on these measures revealed a clear distinction between loci that influence insulin secretion and insulin sensitivity [5]. A larger percentage of loci from these analyses influenced insulin secretion, suggesting that this is the predominant mechanism of currently known T2DM loci. Additional glycemic measures such as HbA1C, proinsulin level, 2 h glucose response and glucose-stimulated insulin secretion (GSIS) [15,16,17,18,19], as well as phenotypes such as lipid levels [20] and anthropometric traits [21,22,23], are also relevant to T2DM pathophysiology.

Patterns of quantitative trait associations can also help identify groups of T2DM loci with shared physiology and, by extension, potential shared mechanisms. For example, many variants that influence BMI appear to affect neuronal functions and variants that influence glucose levels, HOMA-B and related phenotypes appear to affect pancreatic islet functions [22, 24]. A study used several glycemic measures to define distinct clusters of loci influencing insulin sensitivity, insulin secretion and insulin processing [25]. The majority of loci, however, could not be assigned to a cluster, suggesting that association data from larger studies of these physiological measures are needed.

Not all risk loci will necessarily affect normal physiology and instead, for example, influence progression or exacerbation of T2DM in pre-diabetic individuals. T2DM loci also influence other complex diseases such as T1D [26], other autoimmune disease [27], cancer [28] and neurodegenerative disease [29]. Finally, the increasing availability of richly phenotyped cohorts such as those from biobanks further enables ‘phenome-wide’ studies of hundreds of human phenotypes and diseases [10, 30, 31]. Moving forward, these associations can be exploited to gain additional clues into the mechanisms of T2DM loci.

Consequences of T2DM Variants on Genome Function

Determining the genomic consequences of variants causal for T2DM risk is closely linked to functional annotation of the genome.

Protein-coding variants are causal candidates for risk signals at several loci including SLC30A8, GCKR, PPARG, KCNJ11, ABCC8, HNF1A and HNF4A [3, 7]. Relatively few T2DM loci, however, are likely explained by a protein-coding variant. The majority of loci instead map to non-coding sequence, implying that they likely affect gene regulation [7]. Regulatory processes are often cell type specific and require data generated in each cell type [32, 33]. High-throughput techniques provide genome-wide maps of cell type epigenomes, including accessible chromatin (ATAC-seq, DHS-seq, FAIRE-seq), histone tail modifications and transcription factor binding (ChIP-seq) and DNA methylation (WGBS) [34,35,36]. Consortia such as ENCODE and the NIH Epigenome Roadmap have used these assays to map the epigenome of hundreds of human cell lines and primary tissues [32, 37]. Additional studies have focused on profiling the epigenome of specific diabetes-relevant tissues such as islets, skeletal muscle and adipose [24, 38,39,40].

Variants known to affect regulatory activity are listed in Table 1. A common strategy is to identify candidate variants that overlap epigenome annotations from T2DM-relevant tissues. A study of islet FAIRE-seq revealed that rs7903146 at TCF7L2 lies in islet accessible chromatin [36], which genetic studies support as the most likely causal variant for this locus [7]. Candidate T2DM variants in islet regulatory sites defined using ChIP-seq, FAIRE-seq and DHS-seq have also been reported at MTNR1B [7], JAZF1 [41], CDC123/CAMK1D [42], ZFAND3 [24], WFS1 [43], KCNK16/17 [44], KCNQ1 [45] and CENTD2 [46•, 47]. T2DM variants also overlap regulatory sites active in other tissues such as skeletal muscle at ANK1 [40], adipose at PPARG [48] and pre-adipose and brain at FTO [49••, 50, 51]. Experiments have then confirmed the allelic effects of these candidate variants on cell-type regulatory activity, for example through gene reporter assays or correlating variant genotypes to molecular outputs using allelic imbalance or QTL mapping. For example, at TCF7L2, the risk allele of rs7903146 correlated with increased accessible chromatin in islet samples and had increased reporter activity in islet cell lines [36]. At some loci, such as PPARG and FTO, multiple functional variants may exist on the same risk haplotype. These studies demonstrate that variants at specific T2DM loci affect regulatory activity in islets, as well as other disease-relevant tissues such as adipose, muscle and brain.

Table 1 Regulatory variants at T2DM signals

Characterizing patterns of genomic annotations at variants across sets of T2DM loci can reveal broad genomic consequences of T2DM risk variants, as well as help prioritize between multiple annotated variants at a locus. Several studies have identified enrichment of variants across T2DM loci in regulatory sites active in pancreatic islets [2••, 7, 24, 39]. These enrichments are particularly pronounced in regions of islet ‘stretch’-enhancer activity or in regions of highly clustered islet active enhancers [24, 39]. Studies have also identified enrichment of variants in regulatory sites for other T2DM-relevant tissues, such as liver, adipocytes and pre-adipocytes and sites bound by specific regulatory proteins within these tissues, such as FOXA2 [2••, 7, 52]. Recent methods incorporate the effects of annotations directly as priors in fine mapping to improve causal variant resolution [53,54,55]. For example, incorporation of genome and 12 cell type epigenome annotations into fine mapping of 81 T2DM loci reduced the number of candidate causal variants by over 35% [2••]. At the CENTD2 locus, incorporating epigenome priors identified a candidate causal variant in islet accessible chromatin, rs140130268, which was not as highly prioritized in genetic data alone [46•]. The use of such methods will benefit future studies to prioritize causal variants at known signals and help identify additional risk variants.

Many T2DM variants likely impact cell-type regulatory activity by altering transcription factor (TF) binding. Identification of disrupted TFs can also provide further clues towards T2DM-relevant regulatory pathways. Methods to identify specific TF binding sites within regulatory regions include in silico prediction, which can be combined with species conservation and DNA footprints, and experimental techniques such as electromobility shift assays (EMSA) and allelic ChIP [34, 56]. For example, at the CAMK1D/CDC123 locus, variant rs11257655 disrupted islet FOXA1 and FOXA2 binding in both EMSA and ChIP assays [42]. At MTNR1B, variant rs10830963 disrupted NEUROD1 binding in islets [7]. In adipose cells, rs4684847 at PPARG affects PRRX1 binding and rs1421085 at FTO affects ARID5B binding within conserved regulatory modules [48, 49••]. In brain, rs8050136 at FTO affects CUX1 binding [51]. In skeletal muscle, rs508419 at ANK1 affects TR4 binding in an accessible chromatin footprint [40]. More broadly, several T2DM loci contain candidate variants that map in islet accessible chromatin footprints for RFX [44]. Finally, a study of DHS-seq data from ENCODE identified T2DM variants mapping in footprints for MODY transcription factors [57]. Together this demonstrates that T2DM variants affect genomic binding of a variety of transcription factors acting within diabetes-relevant tissues. It is likely, however, that not all T2DM variants affect regulatory activity by disrupting transcription factor binding. Instead some may involve changes, for example, in nucleosome positioning [58], DNA methylation [59], miRNA binding [60] or gene splicing [61].

Generation of epigenome data across environmental conditions, disease states, development [62] and cellular populations [63] will continue to help identify mechanisms by which non-coding T2DM variants affect genome function. Epigenomic data from larger numbers of samples will also be useful for correlating genetic variants to chromatin, histone modifications and transcription factor binding via QTL mapping [64, 65]. Furthermore, high-throughput assays can determine allelic effects on enhancer activity or transcription factor binding across tens to hundreds of thousands of variants in a single experiment [66,67,68]. These studies have not yet been widely conducted in key diabetes-relevant tissue and will be invaluable in identifying the effects of T2DM variants on epigenome function.

Target Genes of Causal Regulatory Variants

Identifying the genes affected by T2DM variants is a critical challenge, particularly as most T2DM signals are non-coding and regulatory elements are often distal to their target genes. Genes implicated as targets of variants at T2DM signals are listed in Table 2 .

Table 2 Genes implicated as targets of T2DM signals

The primary approach to determine gene targets is to correlate T2DM regulatory variants to gene expression level via QTL or allele-specific expression (ASE) mapping [33]. For example, a study identified correlation of risk alleles at TCF7L2 with higher TCF7L2 expression in human islets, albeit in few samples [69]. Studies of diabetes-relevant tissue have since mapped gene expression in larger sample sizes [70, 71]. In islets, QTL mapping of RNA-seq identified potential target genes including AP3S2, STARD10, CAMK1D, HMG20A, ADCY5, DGKB, MTNR1B, NKX6–3, ZMIZ1, GPSM1, UBE2E2 and KCNK17 [44, 60, 71]. Additional putative target genes include ANK1, JAZF1, GPSM1, ABCC8, PROX1-AS, ZFAND3 in skeletal muscle [40], IRX3 in brain [50] and IRX3 and IRX5 in pre-adipocytes [49••], and PPARG, KLF14 and CCND2 in adipocytes [9, 48, 72]. A critical aspect of mapping eQTLs at disease signals is determining whether both signals share the same casual variant(s), and thus conditional association or co-localization tests are needed to confirm sharing [73, 74]. An alternate to QTL analyses is to map ASE in heterozygote coding variants linked to a T2DM variant. In islets, this has implicated additional genes such as SLC30A8, ANPEP, KCNJ11, THADA and WFS1 [71, 75]. At most loci, however, no target gene has been defined through either approach. At other loci, multiple target genes have been implicated and may involve a combination of genes. Continued efforts to define target genes will be enhanced by larger expression datasets from key tissues across a variety of environmental conditions and within specific cellular populations, as well as the use of multi-variant expression models [76, 77].

Complementary techniques such as 3C, 4C, Capture-C and Hi-C capture the spatial configuration of chromatin to identify physical links between regulatory elements and target promoters [78, 79]. For example, at the CENTD2 locus 3C analyses in islets identified interactions of T2DM risk variants with the promoter region of STARD10 [46•]. At the FTO locus, 4C-seq analyses in mouse brain regions identified long-range interactions between T2DM regions and IRX3 [50]. Data from Hi-C assays can also identify broad chromatin domains called TADs (topologically associating domains) that can be used to restrict the genomic space over which regulatory elements act [80]. For example, at the FTO locus, TAD definitions were used to restrict genes likely to be regulated by T2DM risk variants including IRX3 and IRX5 [49••]. Chromosome conformation data can also be combined with RNA-seq and allelic imbalance mapping to boost the ability to detect QTLs for T2DM variants on gene expression [81], as well as a means to identify haplotype-specific regulatory effects [82].

Rare and low frequency T2DM risk variants affecting protein function can be used to bolster support for the gene likely affected by a risk signal. For example, many T2DM loci map near genes that cause Mendelian T2D and obesity [2••] and it is likely that the majority of these genes are the targets of common variants. Rare and low frequency coding variants at T2DM loci identified through exome sequencing also influence polygenic T2D risk, such as at MTNR1B, PAX4, PPARG, PAM, RREB1 and SLC30A8 [2••, 83,84,85], supporting the likely causality of these genes. In addition, the availability of large Biobanks has enabled the recall of individuals with rare coding variants for phenotypic studies, such as for CDKN2A [86].

Finally, genome editing is increasingly an invaluable tool for determining target genes. For example, editing the risk alleles of a T2DM variant or deleting entire regulatory regions surrounding these variants followed by RNA-seq or 4C/Hi-C/Capture-C can be used to identify genes with differences in expression or promoter interactions caused by specific T2DM variants. For example, at the TCF7L2 locus, deletion of regulatory regions surrounding the T2DM risk variants revealed effects on expression and promoter interactions at ACSL5 in HCT116 cells [87]. At the FTO locus, editing the alleles of rs1421085 in pre-adipocytes confirmed effects of this variant on regulation of IRX3 and IRX5 [49••]. Future efforts to define target genes will thus benefit greatly from genome editing in cell lines and iPSC-derived models of diabetes-relevant tissue.

Cellular and Animal Models of T2DM Loci

Modelling risk variant and gene function within cellular and animal systems can provide further support for the role of a candidate target gene in T2DM pathogenesis as well as determine biological functions and pathways target genes affect.

Cellular models of candidate genes at T2DM loci include studies of primary tissue, diabetes-relevant cell lines and cells derived from induced pluripotent stem cells (iPSCs). For coding variants, one approach is to assess the activity of tagged alleles. For example, at GCKR, transfecting fluorescently labelled alleles revealed differential effects on hepatic glucose uptake [88]. As most T2DM variants are regulatory, studies of gene expression changes are more appropriate, for example using siRNA or expression vectors. Silencing CDKN2A expression in the EndoC-bH1 human islet cell line increased insulin secretion and PKA signalling [86]. At TCF7L2, adenovirus-based overexpression in human islets reduced glucose-stimulated insulin secretion [69]. Given the large number of T2DM risk loci and candidate target genes, functional screens can help expedite these evaluations. For example, a siRNA-mediated screen of genes at T2DM loci in EndoC-bH1 cells identified genes affecting insulin secretion [89]. Genome editing in both cell line and iPSC-derived models further enables characterizing the effects of specific variant alleles on cellular function. At IRX3/IRX5, editing the alleles of rs1421085 resulted in altered thermogenesis in differentiating pre-adipocytes [49••].

Animal models can help further determine the effects of altered T2DM gene activity at an organismal level. Knockouts and transgenes in rodents both globally and in tissue-specific settings have been characterized for genes at several T2DM loci, such as TCF7L2 [90], SLC30A8 [91,92,93], MC4R [94], WFS1 [95], IRS1 [96], STARD10 [46•], IRX3 [50], FTO [97] and RPGRIP1L [98]. At some loci, rodent models provide additional support for likely mechanisms of human risk variants. For example, STARD10 beta cell-specific mouse knockouts had impaired proinsulin processing and insulin secretion, and overexpression in beta cells improved glucose homeostasis, consistent with human data [46•]. In other cases, data from rodent models have complicated interpretation of risk mechanisms such as at SLC30A8, where multiple knockout models had differential effects on glucose homeostasis [84, 99]. A further challenge with rodent models is evaluating the effects of human risk variants, as most are non-coding and not often highly conserved across species. One strategy is to introduce human sequence directly in rodents, such as in bacterial artificial chromosomes (BACs) [100]. The utility of rodent models will be enhanced as more target genes are defined and expedited by the use of genome editing techniques.

Moving forward, as T2DM risk is a consequence of many variants, models of multiple T2DM signals will be informative in addition to single locus models. Furthermore, given the potential for T2DM risk variants to interact with environmental cues, these models ideally need to be evaluated in a variety of environmental contexts. Finally, in addition to rodent and human cellular models, zebrafish and drosophila are attractive models due to the relative ease of genetic manipulations, short generation time and the conservation of glucose homeostasis mechanisms across species [101, 102].

Molecular Pathways of Genes at T2DM Risk Loci

Genes affected by T2DM risk signals likely map within shared cellular pathways through which disease pathogenesis is mediated.

A study analysed genes mapping to T2DM-associated variants or involved in monogenic diabetes using protein-protein interaction (PPI) networks [3]. T2DM genes mapping within interaction networks were generally more inter-connected than expected. The most inter-connected node in the network was CREBBP, involved in chromatin remodelling of regulatory elements, suggesting the importance of this protein to T2DM-relevant pathways, even though it does not harbour known T2DM risk variants itself. Additional analyses identified an enrichment of T2DM-associated variants among genes involved in adipocytokine signalling.

An alternate approach to understand pathways of T2DM variants is to define the upstream factors broadly regulating target gene activity. For example, studies have found that T2DM-associated variants preferentially map within binding motifs for transcription factors involved in MODY [57], as well as RFX motifs [44]. A fine-mapping study of T2DM loci identified enrichment of likely causal T2DM variants in FOXA2-bound regulatory sites [7]. Candidate genes at FOXA2 loci were preferentially downregulated in knockout models of FOXA2 compared to other loci. Although binding of FOXA2 itself was not often disrupted by the variants directly, it suggests the potential importance of this protein in regulating T2DM-relevant sites.

Finally, mapping the trans effects of a T2DM variant can reveal disease-relevant gene networks. For example, variants at the KLF14 locus with maternally inherited cis effects on KLF14 expression in adipose tissue were also associated with adipose expression of an entire gene network in trans [72]. Genes regulated in trans were then further associated with insulin resistance related traits, broadly implicating the KLF14-regulated gene network in metabolic disease risk.

Thus far, however, efforts to comprehensively define pathways of T2DM genes have not been widely successful. This is due in part to the still relatively small number of T2DM loci at which the specific target gene(s) and upstream regulator(s) are known, in combination with the large number of pathways likely involved in T2DM pathophysiology. A further complication is the incomplete knowledge of biological pathways and interactions. Future studies incorporating increasing knowledge of target genes and regulators and their tissue activity will help uncover molecular pathways of T2DM risk.

Conclusions

Determining the molecular mechanisms of T2DM risk loci holds enormous promise as a means to understand the genes and pathways involved in diabetes pathophysiology. The mechanisms of the majority of T2DM risk loci, however, are currently unknown and even for loci with proposed mechanisms the stories are often incomplete. In addition to this review, Grotz et al. also discuss identifying causal mechanisms and genes at T2DM risk loci [103].

Evaluation of risk loci entails genetic fine-mapping, quantitative phenotype association, genomic and epigenomic annotation and cellular/animal models. Finding consistency across these data is critical to fully describe mechanisms as well as their potential for clinical translation. Physiological and genomic studies both suggest that many currently known T2DM loci affect pancreatic islet regulation. However, loci also affect protein function and adipose, liver, muscle and brain regulation. As the majority of T2DM risk remains to be described, continued discovery of risk loci will undoubtedly reveal a greater diversity of cellular mechanisms.

Future studies that leverage the substantial advances and opportunities in high-throughput epigenomics, functional screening, genome editing and statistical methods to integrate these data will further expedite efforts to describe mechanisms of T2DM loci.