Keywords

4.1 GWAS for Type 2 Diabetes

More than 400 million people are affected by diabetes mellitus worldwide, and the number of patents is estimated to rise to more than 600 million by 2040 [1]. Increasing prevalence of diabetes is a serious concern in many countries. Of the total global diabetes rate, 90% are living with type 2 diabetes (T2D), which is characterized by insulin resistance in peripheral tissues and impairments of insulin secretion from pancreatic β-cells. Although the current rise in T2D prevalence is explained mainly by changes in life-style, complex genetic determinants are widely considered to contribute to an inherent susceptibility to this disease [2,3,4,5]. A sibling relative risk of T2D was reported to be approximately 2 [4], and its heritability has been estimated at 30–70% [5]. Like other common diseases, the pathogenesis of T2D is considered polygenic, and the effects of individual genetic factors are modest by themselves [6]. Development of high-throughput genotyping technologies and statistical and computational software has allowed remarkable progress over the past decades in the research fields for genome-wide search to discover novel genetic loci for T2D susceptibility [6]. In 2007, five GWAS for T2D performed by European and American Groups identified robust susceptibility loci to European T2D, and until 2016, more than 90 T2D susceptibility loci have been identified through GWAS in different ethnic groups. The empirical threshold for statistical significance used here is p < 5 × 10−8 unless a different study-wise threshold has been applied and noted. It is also important to remember that loci are labeled by the gene(s) nearest to or functionally plausible for the association signal and that they do not necessarily explain the true functional gene responsible for the signal.

4.1.1 Genetics of T2D: Before the GWAS Era

Prior to the GWAS era, the importance of genetic factors in the etiology of T2D had been well established through family and twin studies [2,3,4,5], and the linkage analysis and candidate-gene association studies were applied as the primary approaches to identify susceptibility loci for diseases or phenotypic traits [7, 8]. Reynisdottir et al. identified segments in chromosomes 5 and 10 with suggestive linkage to T2D [8], and showed that the chromosome 10 region harbored the TCF7L2 [9]. Subsequently, the association of TCF7L2 with T2D was replicated not only in populations of European descent but also in other ethnic groups [10,11,12,13,14,15,16], including the Japanese [17, 18]. Candidate-gene association studies showed that PPARG [19] and KCNJ11 [20] were susceptibility genes for T2D. Both genes encode targets of anti-diabetes medications (thiazolidinediones and sulfonylureas, respectively) and harbor missense variants associated with T2D: P12A in PPARG and E23K in KCNJ11 [19, 20]. The successful identification of these genes encouraged the genetic study of T2D; however, these classical approaches were not recognized as suitable to identify variants that have a smaller effect on disease susceptibility. Therefore, the discovery of novel T2D susceptible loci had been challenging, and a more powerful strategy was needed to overcome this difficulty.

4.1.2 The Initial Phase of GWAS Era of T2D Genetics (2007–2008)

In 2007, GWAS for T2D was conducted in a French population composed of 661 cases and 614 controls, covering 392,935 SNP (single nucleotide polymorphism) loci. This study identified novel association signals at SLC30A8, HHEX, LOC387761, and EXT2 and validated the association at TCF7L2 previously identified through linkage analysis [21]. Shortly after the French GWAS, the Icelandic study group confirmed the association of SLC30A8, HHEX, and the newly identified CDKAL1 with T2D [22]. At the same time, three collaborating groups, the Wellcome Trust Case Control Consortium/United Kingdom Type 2 Diabetes Genetics consortium (WTCCC/UKT2D), the Finland-United States Investigation of NIDDM (FUSION), and the Diabetes Genetics Initiative (DGI), reported the consistent associations of SCL30A8, HHEX, CDKAL1, IGF2BP2, and CDKN2A/B with T2D in European populations [23,24,25]. These novel loci and two previously known variants (PPARG P12A and KCNJ11 E23K) were confirmed by multiple replication studies composed of European and non-European populations with the exception of LOC387761 and EXT2. Thus, the first round of European GWAS confirmed eight T2D susceptibility loci across multiple ethnic groups: TCF7L2, SLC30A8, HHEX, CDKAL1, IGF2BP2, CDKN2A/B, PPARG, and KCNJ11 [21,22,23,24,25]. In addition to these eight loci, the WTCCC/UKT2D study identified a strong association between FTO variants and T2D, although the effect of FTO variants on conferring susceptibility to T2D was mostly mediated through increase in body weight [26].

After the first round of European GWAS, an effort was made to increase sample size so that common variants with smaller effect sizes would be detectable. WTCCC/UKT2D, FUSION, and DGI combined their data to form the Diabetes Genetics Replication and Meta-analysis (DIAGRAM) consortium. Six additional novel loci, JAZF1, CDC123/CAMK1D, TSPAN/LGR5, THADA, ADAMTS9, and NOTCH2, were identified in a genome-wide scan comprising a substantial sample size (4549 cases and 5579 controls) and more than 2.2 million SNPs (either directory genotyped or imputed), followed by replication testing [27].

4.1.3 GWAS in Groups of East Asian Descent (2008–2011)

Over the past decades, many Asian countries have experienced a dramatic increase in the prevalence of T2D. Cumulative evidence suggests that Asians may be more susceptible than populations of European ancestry to insulin resistance and diabetes, which was thought to be due to differences in interethnic genetic inheritance [28]. Many of the association of the T2D loci identified by European GWAS, especially the first round of GWAS, have been confirmed in Japanese populations [6, 29, 30]. However, there are significant interethnic differences in the risk allele frequency or in effect sizes at several loci, which may affect the power to detect associations in these populations. For example, risk allele frequencies of TCF7L2 variants showing the strongest effect on T2D in European populations are very few in the Japanese (~5%) compared to populations of European descent (~40%) [17, 18]. Consequently, the association of TCF7L2 variants and T2D appears statistically less significant in the Japanese [17, 18]. In addition, the effects of some loci identified through European T2D GWAS were not consistent in Japanese populations [6, 29, 30]. Therefore, it is necessary to identify ethnic group-specific T2D susceptibility loci, those have not been captured by the European studies, to explain T2D heritability in populations of Asian descent.

In 2008, two independent Japanese GWAS, conducted by Millennium genome project (MHLW) and BioBankJapan (BBJ), simultaneously identified the KCNQ1 locus as a strong T2D susceptibility locus in the Japanese [31, 32]; this was the first established T2D susceptibility locus through non-European GWAS. Subsequent replication studies performed in different ethnic groups revealed that single nucleotide variants located at intron 15 of KCNQ1 had the strongest effects on conferring susceptibility to T2D in several East Asian populations [33,34,35,36]. The association of the KCNQ1 locus with T2D was replicated in European populations, but the minor allele frequencies in Europeans were considerably lower than those in East Asian populations (~7% in Europeans versus ~40% in East Asians). Thus, in contrast to TCF7L2, the attributable fraction of KCNQ1 on T2D susceptibility was relatively small in European populations. Since the KCNQ1 locus was not captured in the European studies, this finding emphasizes the importance of examining susceptibility loci in different ethnic groups. Although the two Japanese GWAS successfully identified the KCNQ1 locus, these studies had limited sample sizes at the initial stage of the genome-wide scan: MHLW, 187 T2D cases vs. 752 controls [32]; BBJ, 194 T2D cases vs. 1558 controls [31].

A Japanese GWAS of a larger sample size (4470 T2D vs. 3071 controls) discovered additional two T2D susceptibility loci, UBE2E2 and C2CD4A-C2CD4B in 2010 [37]. Associations between these loci and T2D were confirmed in East Asian replication study [37] and large-scale European GWAS afterward [38], suggesting GWAS for T2D using non-European as well as European populations is useful to facilitate identification of both ethnicity-specific and common-susceptibility loci among different ethnic groups.

An effort was made to increase sample size in East Asian population as well as in European combined their data to form Asian Genetic Epidemiology Network (AGEN) consortium [39]. Eight additional novel loci, GLIS3, PEPD, FITM-R3HDML-HNF4A, KCNK16, MAEA, GCC1-PAX4, PSMD6, and ZFAND3, were identified in a genome-wide scan comprising a substantial sample size (6952 cases and 11,865 controls) followed by replication testing (Stage 2 in silico replication analysis 5843 cases and 4574 controls de novo replication analysis 12,284 cases and 13,172 controls) [39].

4.1.4 GWAS with Imputation and Large-Scale Meta-Analyses (2012–)

In order to identify common variants of weaker effects, efforts have been made to increase sample size by combining association data from multiple cohorts by meta-analyses. DIAGRAM consortium has constantly developed the scale of collaboration, incremental meta-analyses (DIAGRAM+ and DIAGRAM v3) [38, 40] adding GWA data from further studies from European descent to DIAGRAM v1 data (DIAGRAM+; total of 8130 cases and 38,097 controls [40], DIAGRAM v3; total of 12,171 cases and 56,862 controls [38]) together with extensive replication have identified additional loci (12 and 8 loci, respectively).

In the meantime, four additional loci (ANK1, MIR129-LEP, GPSM1, and SLC16A11-SLC16A13) have been identified by Japanese GWAS, with increment of the sample size [41] and number of variants examined by the imputation of genotypes [29, 41]. The latest Japanese GWAS meta-analysis has identified seven additional T2D susceptibility loci (CCDC85A, FAM60A, DMRTA1, ASB3, ATP8B2, MIR4686, and INAFM2) in a genome-wide scan comprising the largest sample size in the East Asian population (15,463 cases and 26,183 controls) followed by replication testing (7936 cases and 5539 controls) [30].

Thus, larger GWAS meta-analyses combined multiple cohorts with homogeneous populations have continued to expand the number of T2D loci. In 2014, motivated by a consistency of common variant associations observed across different populations [42, 43], a trans-ethnic GWAS meta-analysis of more than 110,000 individuals, which combined GWAS data in multiple ethnic groups including European, East Asian, South Asian, and Mexican/Mexican American, has been performed [44]. Seven additional new loci for T2D susceptibility were successfully identified by combining GWAS from multiple ancestry groups, which highlighted the benefits of trans-ethnic GWAS [44].

Established susceptibility loci for T2D identified by 2016 are shown in Fig. 4.1.

Fig. 4.1
figure 1

Established T2D susceptible loci. The x-axis shows the year of publication. Background color indicates ethnic composition of the samples in the discovery GWAS: European (blue), Japanese (red), Chinese (purple), African American (gray), East Asian (orange), South Asian (green), trans-ethnic (yellow), and Inuit (pink)

4.1.5 What Have T2D GWAS Brought About So Far?

4.1.5.1 Identified Loci for T2D Linked More Frequently to β-Cell Function than to Insulin Sensitivity

The etiology of T2D is a combination of β-cell dysfunction and insulin resistance, which is promoted by either genetic or environmental factors (e.g., obesity, Westernized diet, and lifestyle). Interestingly, majority of the known T2D susceptible variants appear to influence insulin secretion rather than insulin resistance. For example, large meta-analysis from DIAGRAM+ demonstrated that of 31 confirmed T2D susceptibility loci, 10 (MTNR1B, SLC30A8, THADA, TCF7L2, KCNQ1, CAMK1D, CDKAL1, IGF2BP2, HNF1B, and CENTD2) were nominally associated with reduced homeostasis model assessment of β-cell function (HOMA-β) which estimates steady-state β-cell function and only 3 (PPARG, FTO, and KLF14) were associated with HOMA of insulin resistance (IR)(HOMA-IR), an indicator for insulin resistance [40]. Consistent result was observed in larger study afterward [45]. Moreover, the loci identified in the early phase of Japanese GWAS, namely, KCNQ1, UBE2E2, C2CD4A-C2CD4B, and ANK1 were shown to be associated with decreased β-cell function in nondiabetic control groups [29, 32, 34, 37]. Prior to the accumulation of GWAS data, a genetic predisposition to insulin resistance had been considered to play dominant roles in development of T2D, especially in populations of European origin [40]. The results obtained from GWAS, however, emphasize the crucial role of the pancreatic β-cell in the onset of T2D, and a genetic predisposition to reduced β-cell function may contribute more to the susceptibility to T2D.

4.1.5.2 Missing Heritability

GWAS have successfully identified novel T2D susceptibility loci that had not been captured by classical approaches. However, it has been estimated that only ~10% of the known T2D heritability could be explained by those T2D susceptibility loci [38, 46]. Because polygenic analyses in the European ancestry GWAS have suggested many more common variant loci not yet reaching genome-wide significance could contribute to the heritability of T2D susceptibility [38, 46], residual genetic variance explained by a long tail of common variant signals of lesser effect could be captured in larger-scale analyses of various individual ethnic populations or trans-ethnic meta-analysis. The rationale of GWAS is based on the “common disease-common variant” hypothesis, and studies have focused on finding common variants associated with the disease; therefore, susceptibility variants having a minor allele frequency (MAF) of less than 1% are frequently missed, with limited exceptions [47, 48]. It has been a matter of considerable debate whether low-frequency risk variants, which could be evaluated by next generation sequencing and may have relatively large effects, could explain the missing heritability. To test this hypothesis, the GoT2D and T2D-GENES consortia performed whole-genome sequencing (WGS, n = 2657) and whole-exome sequencing (WES, n = 12,940) with 26.7 million variants, including 4.16 million low frequency (0.5 <MAF <5%) or 6.26 million rare (MAF <0.5%) variants. The results indicated variants associated with T2D after sequencing were overwhelmingly common (MAF >5%); therefore they concluded that this sequencing analysis did not support the idea that lower-frequency variants have a major role in predisposition to T2D [49], although sample sizes for initial WGS/WES were considered too small for rare variants analyses.

4.1.6 Translation of T2D Genetics into Clinical Practice

4.1.6.1 The Possibility of Disease Prediction and Prevention

One of the most anticipated clinical uses of genetic information is to predict an individual’s risk of developing T2D. Indeed, genetic investigations suggested lifestyle intervention arm of the Diabetes Prevention Program (DPP) attenuated genetic risk defined by carrying TCF7L2 risk allele [10] or GRS constructed with 34 confirmed loci attenuated risk of developing diabetes [50]; these are good examples of the clinical usefulness of genetic testing to allow detection of high-risk individuals with whom physicians should aggressively intervene. Since the discovery of multiple T2D risk genetic variants, genetic risk score (GRS) calculated based on the number of risk alleles in subjects who developed disease has become a common approach to indicate individual’s genetic risk. Our study group examined the utility of GRS based on 49 established T2D loci (GRS-49) in the Japanese (Fig. 4.2) [51]. GRS-49 was significantly associated with T2D risk in a Japanese population, and those with a GRS ≥60 (5.7% of the population examined) were 9.81 times as likely to have type 2 diabetes compared with those with a GRS <46 (4.2% of the population examined) (Fig. 4.2b) [51]. The result suggested even though the impact of each T2D susceptibility locus was very small, accumulation of genetic information was useful to detect a high-risk group for the disease in a population. However, the area under the receiver operating characteristic (ROC) curves for GRS was 0.624, and the effect of adding GRS into clinical factor (age, sex, and BMI) was as small as 0.03 even though the incremental effect was statistically significant (Fig. 4.2c) [51]. The performance of genetic prediction models using GRS has been evaluated in over 30 studies including Asian and European with case-control study sets or prospective cohorts [52]. The results were consistent among these studies including ours [51]: AUCs of genetic information alone for T2D were 0.579–0.641 and incremental predictive performance of T2D using established marker is statistically significant but limited [52]. Insufficient information is available to construct a genetic risk score for T2D because of so-called missing heritability, and it is far from translating into clinical practice at present. Identification of causal variants, epigenetic modifications, gene-gene interactions, and gene-environment interactions as well as uncovering residual T2D susceptible genetic variants may improve the clinical utility of genetic information for T2D prediction [52].

Fig. 4.2
figure 2

Evaluation of a genetic risk score (GRS) constructed by summing up the number of risk alleles for GWAS-derived 49 single nucleotide variants (SNVs) in 4399 Japanese participants [51]. (a) Distribution of the number of risk alleles in patients with T2D (black bars, n = 2613) and controls (white bars, n = 1786). (b) Odds rates (ORs) for individual groups with different number of T2D risk alleles relative to the reference group having 37–45.5 risk alleles. The vertical bars represent 95% confidence intervals. (c) Receiver Operating Characteristic plot for model 1 containing GRS (black line, area under the curve (AUC) = 0.624); model 2 containing sex, age, and body mass index (BMI) (broken line, AUC = 0.743); and model 3 containing GRS, age, sex, and BMI (gray line; AUC = 0.773)

4.1.6.2 The Possibility of Identifying Novel Biological Mechanisms and Therapeutic Targets

Because GWAS is a biology-agnostic method to detect genetic variations that predispose to a disease, the results may contribute to identify novel biological mechanisms, which may lead to discover novel therapeutic targets for T2D. However, uncovering underlying molecular mechanisms by which the loci contribute to susceptibility to type 2 diabetes has been behind, compared with GWAS discovery. A major obstacle is that the causal variants and molecular mechanisms for diabetes risk are unknown in the most of the identified T2D susceptibility loci. Furthermore, most genetic risk variants are found in the intronic or noncoding regions of genes and are more likely to affect regulation of transcription rather than gene function. Thus, it has been challenging to elicit novel biological insight, which may uncover the disease pathogenesis and guide drug discovery from GWAS derived genetic information.

To identify biological candidate for causal genes at established T2D risk loci systematically, our study group utilized an in silico pipeline, originally developed by Okada et al. [53], using various publicly available bioinformatics methods based on (i) functional annotation, (ii) cis-acting expression quantitative trait loci, (iii) pathway analyses, (iv) genetic overlap with monogenic diabetes, (v) knockout mouse phenotypes, and (vi) PubMed text mining [30]. Seven genes (PPARG, KCNJ11, ABCC8, GCK, KIF11, GSK3B, and JUN) were identified as potential drug targets for T2D treatment by integrating disease-associated variants with diverse genomic and biological datasets and subsequent drug target search (Fig. 4.3) [30]. Of these, PPARG, KCNJ11, and ABCC8 have been well known as targets for the already approved T2D treatment options, and a GCK activator is currently undergoing clinical trials for the T2D treatment. Thus, this in silico pipeline was capable to detect drug target of established T2D treatment suggesting the capability for developing novel T2D treatment. Inhibitors for KIF11, GSK3B, and JUN were under clinical trial for the treatments of cancers (KIF11, GSK3B) or rheumatoid arthritis (JUN); these compounds could also be potential treatments for T2D [30]. Thus, systematic approaches for integrating the findings of genetic, biological, and pharmacological studies could be a useful strategy for developing new T2D treatments.

Fig. 4.3
figure 3

Discovery of potential drug targets for the treatment of T2D [30]. (a) Strategy for drug targets search based on the genetic information derived from GWAS. Biological T2D risk genes were selected from among the T2D potential risk genes located in any of the established T2D risk loci, using a scoring system by summing up the number of prioritization criteria satisfied. We selected novel T2D therapeutic targets from the overlapping genes between the drug target genes and the biological T2D risk genes or genes those products are in direct PPI with the biological T2D risk gene products. (b) Representative connections between T2D risk SNVs (blue), T2D biological genes (green), drug target genes (purple), and targeted drugs. ∗ Approved compounds for T2D treatment ∗∗Compounds for T2D treatment under clinical trial ∗∗∗ Compounds under clinical trial for treatment against diseases other than T2D

4.2 GWAS of Metabolic Traits

The etiology of T2D is characterized by reduced insulin secretion due to impaired beta-cell dysfunction and the presence of insulin resistance. The heritability of insulin secretion, peripheral insulin action, and nonoxidative glucose metabolism has been investigated in young and old Danish twins and was estimated that 75–84%, 53–55%, and 48–50% were attributed to genetic factor, respectively, showing that there is a strong genetic component in the etiology of these traits [54]. As a result, we would expect to find genetic loci associated with these traits through non-hypothesis-driven GWAS, and see new loci, which we would not have known to be implemented in these traits. As GWAS for type 2 diabetes have been successful in identifying many susceptibility loci (please see the section described above), so has been the case for insulin secretion and action. There are many kinds of metabolic traits such as lipid, adiponectin, and leptin levels that play an important role in the metabolism of type 2 diabetes. Here we will focus on genetic loci reported for fasting glycemic traits, including fasting glucose and insulin, proinsulin, and hemoglobin A1c (HbA1c).

4.2.1 GWAS of Common Variants for Glycemic Traits

4.2.1.1 European Studies

Before the advent of the GWAS era, a few loci were demonstrated to be influencing fasting glucose level in healthy individuals. Using candidate gene approach, association studies identified variants in three genes, GCK, G6PC2, and GCKR [55,56,57,58], unequivocally implemented in fasting glucose level. The first GWAS to report genetic loci for diabetes-related quantitative traits was conducted on HbA1c level. Pare et al. evaluated 337,343 SNPs in 14,618 nondiabetic women of Caucasian ancestry in the Women’s Genome Health Study [59]. In addition to confirming the HbA1c association at GCK and G6PC2, they identified a novel locus at HK1. Another locus, SLC30A8, which was known for its association with T2D reached border-line genome-wide significance (p = 9.8 × 10−8).

The Meta-Analyses of Glucose and Insulin traits Consortium (MAGIC) investigators undertook a series of GWAS on fasting glycemic traits in nondiabetic individuals and succeeded in identifying several genetic loci (Fig. 4.4). By 2011, their efforts led to the discovery of 16 loci for fasting glucose level (known G6PC2, MTNR1B, GCK, GCKR, SLC30A8, TCF7L2; recently reported DGKB-TMEM195; novel ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1, and C2CD4B), 2 for fasting insulin level/HOAM-IR (known GCKR and novel IGF1), 5 for postoral glucose tolerance test (OGTT) (GIPR, GCKR, ADCY5, TCF7L2, C2CD4A/B), 10 for proinsulin level (MADD, SLC30A8, TCF7L2, C2CD4A/B, PCSK1, ARAP1, LARP6, SGSM2; body mass index (BMI) adjusted locus SNX7, women-specific locus DDX31), and 10 for HbA1c (known HK1, MTNR1B, GCK, G6PC2; novel SPTA1, FNK3, HFE, TMPRSS6, ANK1, APT11A/TUBGCP3) [60,61,62,63]. This brought the number of loci associated with one or more glycemic traits to 31. These studies highlighted several important biological pathways involved in glucose and insulin metabolism, such as signal transduction, cell proliferation, development, glucose-sense, and circadian regulation. It also demonstrated that on one hand, studying genetics of glycemic trait can help identify T2D risk loci but, on the other hand, that not all loci associated with glycemic traits in healthy population (with glucose level in the “physiological” range) affect the risk of T2D (with glucose level in the “pathological” range), showing that there are un-overlapping mechanisms between fasting glucose elevation and development of T2D.

Fig. 4.4
figure 4

Schematic view of the >80 established loci for fasting glycemic traits, including fasting glucose, insulin, proinsulin, and HbA1c

MAGIC investigators extended their effort by increasing the sample size for discovery GWAS from 46,186 to 133,010 nondiabetic participants and incorporating Illumina CardioMetabochip, a custom iSELECT array of ~200 k SNPs that covers putative association signals for a wide range of cardiometabolic traits and fine-maps established loci [64]. This approach identified 41 novel loci associated with glycemic traits, raising the number of loci associated with fasting glucose level to 36, fasting insulin to 19 and 2 h postprandial glucose (2hGlu) to 9 (Fig. 4.4). The large increase in the number of insulin-associated loci (from 2 to 19) was partly owing to the incorporation of analyses with and without adjustment for BMI [64]. The authors speculated that because BMI explained more of the variance in fasting insulin level than in fasting glucose (R2 32.6% vs. 8.6%), BMI adjustment provided more opportunity to detect true genetic associations for fasting insulin level by removing the variance in insulin level influenced by BMI. These loci affecting fasting insulin concentration showed association with lipid levels and fat distribution, suggesting impact on insulin resistance. Of the total 53 glycemic loci, 33 were also associated with increased risk of T2D (q < 0.05). Although the overlapping loci between glycemic traits and T2D were increased, the overlap was incomplete and many glycemic loci had no discernible effect on T2D (Fig. 4.5) [64].

Fig. 4.5
figure 5

Per-allele β coefficients for glucose and insulin concentrations versus ORs for T2D (reproduced from Scott et al. [64]). (a) Fasting glucose concentration versus type 2 diabetes (T2D). (b) Fasting insulin (FI) concentration versus T2D. (c) Fasting insulin concentration adjusted for body mass index (BMI) versus T2D. (d) 2-hour glucose versus T2D. Each locus is color-coded according to the strength of association with T2D as indicated in (a)

From a similar point of view with the BMI adjusted analysis undertaken by MAGIC investigators, Manning et al. implemented a joint meta-analysis approach to test associations with fasting insulin and glucose concentration accounting for variant by BMI interaction on a genome-wide scale [65]. Six previously unknown loci associated with fasting insulin at genome-wide significance were identified (COBLL1-GRB14, IRS1, PPP1R3B, PDGFC, LYPLAL1, and UHRF1BP1).

To characterize the known 37 T2D loci and examine the relationship with indices of proinsulin processing, insulin secretion, and insulin sensitivity, MAGIC investigators combined data on both basal and dynamic measures to perform cluster analysis [45]. This analysis highlighted clusters characterized by (i) primary effects on insulin sensitivity (PPARG, KLF14, IRS1, GCKR), (ii) reduced insulin secretion and fasting hyperglycemia (MTNR1B, GCK), (iii) defects in insulin processing (ARAP1), (iv) influence on insulin processing and secretion without a detectable change in fasting glucose level (TCF7L2, SLC30A8, HHEX/IDE, CDKAL1, CDKN2A/B), and (v) unclassified (20 loci).

4.2.1.2 Studies Conducted in Non-European Population

GWAS on glycemic traits in non-European population was conducted around the world. In 2011, a large-scale GWAS meta-analysis on metabolic traits was conducted in East Asian population, identifying one novel locus for fasting glucose at SIX2-SIX3 [66]. GWAS in African Americans identified novel loci for insulin and insulin resistance assessed by Homeostasis Model Assessment for Insulin Resistance (HOMA-IR) at SC4NOL and TCERG1L [67]. More recent GWAS in East Asians detected several novel loci for glycemic traits: C12orf51, PDK1-RAPGEF4, KANK1, IRS1 for fasting glucose; MYL2, C12orf51, OAS1 for 1-2hGlu; TMEM79, HBS1L/MYB, MYO98, CYBA for HbA1c [68,69,70]. Among these novel loci, IRS1 and C12orf51 were associated with T2D [38, 71]. GWAS in an isolated Inuit population in Greenland has been successful in identifying a common variant in TBC1D4 associated with higher 2hGlu, 2 h-insulin, 2 h-C-peptide, and reduced insulin sensitivity index [72]. This variant was common (minor allele frequency (MAF) 17%) in Greenlandic population, but almost absent in any other population. Homozygous carriers of this TBC1D4 variant had unprecedentedly high risk of T2D (OR = 10.3).

4.2.1.3 Exome-Wide Association Analyses for Glycemic Traits

GWAS for fasting glucose and fasting insulin have identified several common variant loci associated with the traits. However, lead SNPs at GWAS loci have relatively modest effect and explain only a small portion of the variance (4.8% and 1.2%, respectively) [73]. The Illumina HumanExome Beadchip array, a custom array, was designed to facilitate large-scale genotyping of ~250 k mostly rare (MAF <0.5%) and low-frequency (MAF 0.5–5%) protein altering variants selected from sequenced exomes and genomes of ~12,000 individuals. Analyses using this Exomechip have enabled not only to identify novel loci for glycemic traits, but also to clarify the effector transcripts through which the association signals are exerting their effect.

The first report of exome-wide analysis revealed novel loci for low-frequency variants associated with proinsulin level. Low-frequency missense variants in KANK1 (Arg667His, MAF 2.9%) and TBC1D30 (Arg279Cys, MAF 2.0%) were associated with proinsulin level. Missense variants in PAM (Asp563Gly, MAF 5.3%) and the neighboring PPIP5K2 (Ser1228Gly, MAF 5.3%) were associated with insulinogenic index [74]. These two missense variants are significantly associated with T2D and are indistinguishable [75, 76]. Exome array analysis identified two low-frequency missense variants in known GWAS signal for fasting proinsulin concentration, which were independent of the known GWAS SNPs. One was Arg766X (MAF 3.7%) in MADD and the other was Val996Ile (MAF 1.4%) in SGSM2, demonstrating that these two genes were the likely effector transcripts at these loci [74]. Nominal p < 4.46 × 10−8 was used as statistical significance in this analysis, correcting for the number of tests (number of phenotypes multiplied by number of variants tested) conducted [74].

MADD locus was initially identified through GWAS for proinsulin and resides in a region of long-range linkage disequilibrium (LD) that extends >1 Mb in Europeans. Cornes et al. performed targeting deep sequencing of this 11p11.2 locus, encompassing MADD, ACP2, NR1H3, MYBPC3, and SPI1, and conducted association analysis for fasting glucose and insulin concentration using gene-based test (sequence kernel association test (SKAT)) [77]. SKAT is a useful approach to aggregate low-frequency exonic variants to test against phenotype of interest. Gene-based test at 11p11.2 locus demonstrated that 53 rare variants at NRH13 was jointly associated with fasting insulin, suggesting the existence of >2 independent signals at this locus.

Two other exome-array based analyses for fasting glucose and insulin concentration were reported at the same time. Both studies identified a low-frequency missense variant Ala316Thr (MAF 1.5%) at a novel locus GLP1R associated with fasting glucose [73, 78]. The glucose-raising allele of Ala316Thr was associated with lower early insulin secretion, higher 2hGlu concentration and risk of T2D [73]. Multiple low-frequency missense variants at G6PC2/ABCB11, a locus known for its strong association with fasting glucose, were reported in both studies. His 177Tyr, Tyr207Ser, Val219Leu (MAF 0.3%, 0.6%, 45.3%, respectively) in G6PC2 had influence on fasting glucose independently of each other as well as of the known noncoding GWAS common signal [73]. In vitro experiments showed that these missense variants were responsible for the loss of G6PC2 function through proteosomal degradation, and leads to a reduction in fasting glucose level in human [73]. Gene-based SKAT test demonstrated significant association between G6PC2 and fasting glucose level [78]. The two studies used study-wide significance based on the number of variants, genes, and tests performed. For example, one of the studies used p < 3 × 10−7 as significance threshold for single variant analysis and p < 1.6 × 10−6 for gene-based analysis.

Custom Exomechip array contains a certain proportion of noncoding common variants, including known GWAS lead SNPs, in order to facilitate conditional analyses to test evidence for multiple distinct signals at a locus. As a consequence, Exomechip analysis has led to the discovery of several novel loci for glycemic traits with common variants. Exomechip analysis identified additional loci at GPSM1 and HNF1A for Insulinogenic index [74], ABO for insulin action (disposition index) and fasting glucose [73, 74], and URB2 for fasting insulin level [73].

Currently, we are in an exciting time for the discovery of many genetic loci associated with T2D-related quantitative phenotypes. We have summarized >80 loci that have influence on fasting glycemic traits, including fasting glucose, insulin, proinsulin, and HbA1c level (Fig. 4.4). Concurrent approaches using GWAS and Exome array-based analyses have compensatory features to detect these loci. GWAS is widely performed and enables to combine a large number of samples in the meta-analysis. To date, GWAS meta-analysis for fasting glycemic traits are reported on data imputed up to the HapMap reference panel, but ongoing effort to use the latest reference panel for imputation provided by the 1000 Genomes Project will give a better coverage across the low-frequency allele spectrum and is expected to yield many more novel loci for fasting glycemic traits. For exome array-based approach, though it may have limited ability to investigate very rare variants compared to exome sequencing, it is still a cost-effective way and can be more easily performed. Importantly, we have seen proof of principle that exome array genotyping is a powerful way to detect low-frequency variant associations and to enable fine-mapping of the association loci to identify functional variants and effector transcripts through which the association is mediated. The use of these two wheels of analyses is expected to help deciphering the complex picture of the genetics of fasting glycemic traits and its relation with T2D.

4.3 GWAS for Diabetic Nephropathy or Diabetic Kidney Diseases

Diabetic nephropathy is a leading cause of end-stage renal disease (ESRD) in Western countries and Japan. The rising incidence of diabetic nephropathy, especially among patients with type 2 diabetes, is a serious worldwide concern in terms of both poor prognosis and medical costs. Up to now, strict glycemic and/or blood pressure control, protein restriction, or combination of these treatment have been shown to be effective for the prevention of the progression of diabetic nephropathy as well as for reducing cardiovascular events in patients with diabetic nephropathy [79,80,81,82]. Furthermore, remission and/or regression of microalbuminuria have also been reported [83,84,85], and thus the prognosis of subjects with diabetic nephropathy has been significantly improved during the last decade. However, still considerable numbers of subjects were suffered with diabetic nephropathy.

The pathogenesis of diabetic nephropathy appears to be multifactorial, and several environmental and/or genetic factors might be responsible for the development and progression of the disease [86], but precise mechanisms have not been elucidated yet. It has been reported that the cumulative incidence of diabetic retinopathy increased linearly according to the duration of diabetes, whereas the occurrence of nephropathy was almost none after 20–25 years of diabetes duration, and only modest number of individuals with diabetes (~30%) developed diabetic nephropathy [87]. Familial clustering of diabetic nephropathy was also reported both in type 1 [88] and type 2 diabetes [89], From these cumulative evidences, it is suggested that genetic susceptibility plays an important role in the pathogenesis of diabetic nephropathy. Worldwide efforts have been conducted to identify genes conferring susceptibility to diabetic nephropathy, but the efforts by classical approaches, i.e., candidate gene approaches or linkage analyses, have not been successful so far.

GWAS for diabetic nephropathy or diabetic kidney diseases have been performed in European, African American, and Japanese populations. However, the results were not consistent each other, and only a few loci satisfied genome-wide significant level.

4.3.1 GWAS for Diabetic Nephropathy (Diabetic Kidney Disease) in Populations of European Descent

In patients with type 1 diabetes, GWAS for diabetic nephropathy was first conducted by Genetics of Kidneys in Diabetes (GoKinD) study group using 820 cases (284 with proteinuria and 536 with end-stage renal disease) and 885 controls for ~360,000 SNPs, followed by a validation analysis using 1304 participants of the Diabetes Control and Complications Trial (DCCT)/Epidemiology of Diabetes Interventions and Complications (EDIC) study, a long-term, prospective investigation of the development of diabetes- associated complications [90]. Four SNP loci were reported to show suggestive associations through the GWAS, rs10868025 near FRMD3 (9q21.32), rs39059 within CHN2 (7p14.3), rs451041 within CARS (11p15.4), and rs1411766/rs1742858 near MYO16/IRS2 (13q33.3). Among the four loci, association of two loci, FRMD3 and CARS, were validated in the DCCT/EDIC study, although the association did not attain genome-wide significant level. The association of the four loci were further evaluated in 66 extended families of European ancestry, the Joslin Study of Genetics of Nephropathy in Type 2 Diabetes Family Collection, the results indicated that FRMD3 locus showed evidence of association with diabetic nephropathy (advanced diabetic nephropathy or advanced diabetic nephropathy plus high microalbuminuria) or with albuminuria (log transformed albumin to creatinine ratio) [91]. The association of FRMD3 locus with diabetic end-stage renal disease was observed in African American patients with type 2 diabetes lacking two MYH9 E1 risk haplotypes, which was well-known strong risk for nondiabetic kidney diseases in African Americans [92]. In Japanese patients with type 2 diabetes, rs1411766 at ch. 13q33.3 was associated with diabetic nephropathy, and the association attained a genome-wide significant level after integration of two data, Japanese type 2 diabetes and Caucasian type 1 diabetes, by a meta-analysis [93].

In 2012, a meta-analysis of diabetic nephropathy for patients with type 1 diabetes in populations of European origin was performed by the Genetics of Nephropathy-an International Effort (GENIE) consortium [94]. The analysis using advanced diabetic nephropathy (4409 overt proteinuria or end-stage renal disease) and 6691 controls identified that rs7588550 within ERBB4 showed suggestive evidence of associated with diabetic nephropathy. In a subsequent sub-analysis for end-stage renal disease, 1786 cases and 8718 controls including patients with overt proteinuria, two loci, rs7583877 in the AFF3 (2q11.2) and rs12437854 in RGMA/MCTP2 locus (15q26.1), were associated with ESRD with a genome-wide significant level. However, these associations were not validated in independent case-control studies [95]. Genotype imputation using directly genotyped data and linkage disequilibrium data in 1000 genomes database for patients with type 1 diabetes was performed in the Finnish Diabetic Nephropathy (FinnDiane) study. The analysis for 11,133,962 tested SNPs and subsequent first and second stage analyses, comprising of 2142 cases and 2494 controls, identified rs1326934 within the SORBS1 as top signal for susceptibility to diabetic nephropathy, but the association did not reach a genome-wide significant level [96]. Sex stratified GWAS for diabetic nephropathy in European patients with type 1 diabetes identified rs4972593 on chromosome 2q31.1 as susceptibility to ESRD only in women, but not in men, and the results were replicated in independent replication studies [97].

In a GWAS meta-analysis for quantitative traits analysis regarding kidney functions in 54,450 individuals, variants within CUBN showed genome-wide significant association with urinary albumin-to-creatinine ratio (UACR), and it was also shown that an effect size on logarithmic UACR values was fourfold larger among 5825 individuals with diabetes (0.19 log[mg/g], p = 2.0 × 10−5) compared with 46,061 individuals without diabetes (0.045 log[mg/g], p = 8.7 × 10−6; p = 8.2 × 10−4 for difference) [98]. In this analysis, rs649529 at RAB38/CTSC locus on chromosome 11q14 and rs13427836 in HS6ST1 on chromosome 2q21 were associated with UACR only in patients with diabetes.

4.3.2 GWAS for Diabetic Nephropathy in the Japanese Population

In order to identify genes conferring susceptibility to diabetic nephropathy, we have performed a GWAS for diabetic nephropathy using 188 Japanese patients with type 2 diabetes [99, 100]. We commenced an association study using SNPs from a Japanese SNP database [101, 102] established prior to the HapMap database. We screened approximately 100,000 gene-based SNP loci, and the genotype and allele frequencies of 94 nephropathy cases, defined as patients with overt proteinuria or ESRD were compared with those of 94 controls defined as patients with normoalbuminuria and diabetic retinopathy. Approximately 80,000 SNP loci were successfully genotyped, and 1615 SNP loci with p < 0.01 were selected, and forwarded to the validation study. These 1615 SNP loci were analyzed further in a greater number of subjects to clarify their statistical significance. As a result, several SNP loci, including the SLC12A3 locus [103], ELMO1 locus [104], and NCALD locus [105] were found to be associated with diabetic nephropathy.

4.3.2.1 Solute Carrier Family 12, Member 3 (SLC12A3)

The SLC12A3, at chromosome 16q13, encodes a thiazide-sensitive Na + -Cl- cotransporter that mediates reabsorption of Na+ and Cl at the renal distal convoluted tubule; this molecule is the target of thiazide diuretics. Mutations in SLC12A3 are responsible for Gitelman syndrome [106], which is inherited as an autosomal recessive trait characterized by hypokalemia, metabolic alkalosis, hypomagnesemia, hypocalciuria, and volume depletion. A coding SNP in exon 23 of the SLC12A3 (rs11643718, +78 G to A: Arg913Gln) was shown to be associated with diabetic nephropathy (p = 0.00002, odds ratio 2.53 [95% CI 1.64–3.90]). The results implicated that substitution of Arg913 to Gln in the SLC12A3 might reduce the risk to develop diabetic nephropathy. The association of rs11643718 with diabetic nephropathy was replicated in independent case-control studies, including Japanese [107], South Asian [108], and Malaysian subjects [109] with type 2 diabetes. Rs11643718 was associated with end-stage renal disease in Korean patients with type 2 diabetes, but direction of effect was opposite to that in the original report [110]. Rs11643718 did not show significant effect in Caucasian patients with type 2 diabetes (Table 4.1) [111].

Table 4.1 Effect of non-synonymous SNP (rs11643718, Arg913Gln) within the SLC12A3 with diabetic nephropathy

4.3.2.2 Engulfment and Cell Motility 1 (ELMO1)

We identified that the ELMO1 was a likely candidate for conferring susceptibility to diabetic nephropathy (rs741301, intron 18 + 9170, GG vs. GA+AA, χ2 = 19.9, p = 0.000008, odds ratio: 2.67, 95%CI 1.71–4.16) [104]. The association of ELMO1 locus with diabetic nephropathy was observed also in African American patients with type 2 diabetes [112], Caucasian patients with type 1 diabetes [113], South Indian patients with type 2 diabetes [108], Chinese patients with type 2 diabetes [114], and American Indian patients with type 2 diabetes [115], although associated SNPs or direction of the effects varied among the individual studies. The ELMO1 gene, on chromosome 7p14, is a known mammalian homologue of the C. elegans gene, ced-12, which is required for engulfment of dying cells and for cell migration [116]. ELMO1 has also been reported to cooperate with CrkII and Dock180, which are homologues of C. elegans ced-2, ced-5, respectively, to promote phagocytosis and cell shape changes [116, 117]. However, until then no evidence had been reported to suggest a role for this gene in the pathogenesis of diabetic nephropathy. By in situ hybridization using the kidney of normal and diabetic mice, we found that ELMO1 expression was weakly detectable mainly in tubular and glomerular epithelial cells in normal mouse kidney, and was clearly elevated in the kidney of diabetic mice. Subsequent in vitro analysis revealed that ELMO1 expression was elevated in cells cultured under high glucose conditions (25 mM) compared to cells cultured under normal glucose conditions (5.5 mM). Furthermore, we identified that the expression of extracellular matrix protein genes, such as Type 1 collagen and fibronectin, were increased in cells that over-expressing ELMO1, whereas the expression of MMPs (matrix metalloproteinase) was decreased [104, 118]. Therefore, it is suggested that persistent excess of ELMO1 in subjects with disease susceptibility allele leads to the overaccumulation of extracellular matrix proteins and to the development and progression of diabetic glomerulosclerosis. It has been also reported that excess of Elmo1 accelerated the progression of renal injury in mouse model of diabetes, whereas Elmo1 depletion protected the renal injury in these mice [119]. In contrast, experiments using zebrafish suggested that elmo1 had a protective role in the progression of renal injury under diabetic conditions [120].

The association of NCALD locus with diabetic nephropathy was not replicated in an independent population, and the association of above mentioned loci identified in Japanese GWAS for diabetic nephropathy did not attain a genome-wide significant level.

4.3.2.3 Acetyl-Coenzyme a Carboxylase Beta Gene (ACACB)

We extended the previous GWAS for diabetic nephropathy to the SNPs with p values between 0.01 and 0.05 and provide evidence that a SNP, rs2268388, within the acetyl-coenzyme A carboxylase beta gene (ACACB; MIM: 601557) contributes to an increased prevalence of proteinuria in patients with type 2 diabetes across different ethnic populations [121].

The frequency of the T allele of rs2268388 was consistently higher among patients with type 2 diabetes with proteinuria (combined meta-analysis gave a p value of 5.35 × 10−8 in the Japanese, 2.3 × 10−9 for all populations). The association of rs2268388 was replicated in patients with type 2 diabetes in different ethnic groups, including Han Chinese [122] and Indians [123].

Expression of ACACB was observed in adipose tissue, heart, and skeletal muscle, and, to a lesser extent, in the kidney. The results of in situ hybridization with normal mouse kidney revealed that Acacb was localized to glomerular epithelial cells and tubular epithelial cells. We also observed the expression of ACACB in cultured human renal proximal tubular epithelial cells (hRPTECs). In cultured hRPTECs, a 29-bp DNA fragments containing the SNP region had significant enhancer activity, and fragments corresponding to the disease susceptibility allele had stronger enhancer activity than those for the major allele [121].

The quantitative real-time PCR (polymerase chain reaction) using glomeruli isolated from these mice revealed that the expression of Acacb was increased in the glomeruli of diabetic db/db mice compared to those of control mice [124]. Furthermore, overexpression of ACACB in hRPTECs resulted in remarkable increase of the expressions of genes encoding pro-inflammatory cytokines, including IL-6, CXCL1, CXCL2, CXCL5, and CXCL6.

Combining these results with the finding in the genetic study, it is suggested that ACACB contributes to conferring susceptibility to diabetic nephropathy at least in part, via the effects of the pro-inflammatory cytokines, and the ACACB-IL-6 or ACACB-CXCLs systems may be considered as new pathways for the development and progression of diabetic nephropathy.

4.3.3 GWAS for Diabetic Nephropathy in Other Ethnic Groups

An African American GWAS for diabetic nephropathy evaluated 965 ESRD patients with type 2 diabetes and control individuals without type 2 diabetes or kidney disease for 832,357 SNP loci, and in addition to MYH9-APOL1 locus, which is already known susceptibility to nondiabetic kidney diseases, several loci, RPS12, LIMK2, SFI1, were associated with ESRD in patients with type 2 diabetes, although any association did not attain a genome-wide significant level [125].

Results of multiethnic GWAS meta-analysis, including African American, American Indian, European, and Mexican, identified significant association of rs955333 at 6q25.2 with diabetic nephropathy [126].

Susceptibility loci for diabetic nephropathy or diabetic kidney disease with genome-wide significant association are listed in Table 4.2.

Table 4.2 Genetic loci associated with diabetic nephropathy

4.4 Future Perspective

After the human genome (sequencing) project was completed [127, 128], a large body of information on the human genome has been accumulated [129]. Simultaneously, high-throughput genotyping technologies as well as statistical methods and/or tools for handling innumerable datasets have been developed. Then, genome-wide association studies for investigating genes associated with disease susceptibility across the entire human genome have been facilitated, and more than 2000 loci susceptible to various diseases or traits have been discovered [130].

Although this is excellent progress, it has also been recognized that the information obtained from GWAS is still insufficient for clinical application. The focus of ongoing research efforts includes detailed functional characterization of the identified T2D susceptibility variants and the search for missing heritability.

Certain modifications of the GWAS study design will be necessary to uncover the missing heritability. Much larger intra- or trans-ethnic sample sizes will be required to increase the power to detect true signals, which may be conducted in meta-analyses. Examining populations of non-European descent is likely to identify additional T2D loci, and this should be performed more vigorously. Association analyses of low frequency variants for T2D are an additional option. Additionally, it has been shown that the study using small and historically isolated populations may have advantages to identify novel susceptibility to the disease [72]. In this report, GWAS for glycemic traits using a relatively small number of Greenlandic inuits (~2500) identified the nonsense variants in the TBC1D4, which had a striking effect on susceptibility to T2D (OR = ~10). Since similar success was reported to identify novel missense variants within CREBRF associated with obesity in the Samoan population [131], unique variants with a large effect size are conserved in genetically homogeneous populations, and GWAS in these populations, even if its sample size is not so large, are useful to identify novel susceptibility to T2D.

Characterizing disease biology is another relevant goal of genetic studies for T2D, which has been behind compared with GWAS discovery. Recent biological and clinical studies have suggested possible means to increase the translational use of genetic findings through convergence on common resources and workflows, regarding comprehensive gene expression data, epigenomics, PPI networks, and information of cellular and animal models [30, 53, 132]. In order to exploit these trends to advance biological understanding of T2D, it is urgent needs of establishment and effective utilization of publicly available databases including genetic data with large-scale sample size with rich phenotype information, epigenomic and transcriptomic data for diverse tissue types, and comprehensive biological data resource from cellular and animal models.