Introduction

Diabetes is a significant public health problem, placing increasing human and financial pressures on already overburdened healthcare systems. Six percent of the UK population have been diagnosed with diabetes [1, 2]. Diabetes now accounts for 10 % of the UK National Health Service budget, with 80 % of those costs spent managing secondary complications, such as blindness, amputation, heart disease, stroke and kidney disease, which may be potentially preventable or have their onset delayed with earlier management of the risk factors for these disorders [3, 4]. In the USA, during 2012, it was reported that 11.8 % of adults were living with diabetes resulting in estimated costs of $245 billion from a combination of lost productivity and direct healthcare expenditure; these costs are 41 % higher than those estimated in 2007 [5, 6]. In the UK, in 2013, diabetic nephropathy accounted for over 25 % of the incident patients with end-stage renal disease (ESRD) [7] whereas in the USA, in 2012, over 40 % of incident patients needing dialysis had diabetic nephropathy [8]. In several countries including Malaysia, South Korea and Mexico, the incidence of diabetic nephropathy causing ESRD exceeds 50 % [9]. The global population of individuals living with ESRD is increasing steadily; in the USA, one estimate indicates that the ESRD population may exceed two million by 2030 [10]. Among populations with chronic kidney disease, the risk of ESRD and premature mortality is higher for individuals diagnosed with diabetes [11, 12]. The increasing global prevalence of diabetes, combined with the fact that up to 40 % of affected individuals will develop kidney complications [13], is a major incentive to develop tools for earlier diagnosis of diabetic kidney disease, to improve prediction models and to identify novel therapeutic targets. Key priorities include strategies to reduce the incidence of diabetic nephropathy and development of treatments to minimise progression to ESRD.

Clinical Epidemiology of Diabetic Nephropathy

Recent research has provided a number of challenges to our understanding of the natural history of ‘diabetic nephropathy’ [14, 15••]. The classical description of diabetic nephropathy was developed in the 1980s based on clinical observations in longitudinal studies of individuals with type 1 diabetes. The earliest clinical indication of nephropathy was moderately increased albuminuria (usually referred to as ‘microalbuminuria’) with the later development of persistent and severely increased albuminuria (often described as ‘macroalbuminuria’) with peak incidence after approximately 15 years duration of diabetes followed by decline in glomerular filtration rate (GFR) and progression to ESRD [16]. Diabetic nephropathy was associated with a poor prognosis, and the majority of deaths were due to cardiovascular disease or ESRD [1719].

Earlier identification and treatment of diabetic nephropathy proved possible by repeated and sensitive measurements of the urinary albumin excretion rate [20] (used as a marker of diabetic kidney injury) [16, 21]. Although screening for microalbuminuria contributed to improved clinical practice, permitting earlier and more aggressive treatment of diabetic nephropathy, its use as a screening tool is confounded by the fact that microalbuminuria may spontaneously regress in many patients with diabetes [17, 18]. Microalbuminuria therefore does not always predict future risk of kidney failure, and this limits its utility as a biomarker for diabetic nephropathy [22, 23]. Nonetheless, microalbuminuria remains a powerful predictor for the future risk of cardiovascular complications [22].

Defining the clinical phenotype of diabetic nephropathy in type 1 diabetes has become more problematic. For instance, persistent and severely increased albuminuria may accompany rather than precede the fall in GFR whereas in other patients with type 1 diabetes, the associated proteinuria may actually regress despite progressive kidney failure [24]. Of interest, some patients with type 1 diabetes and a low GFR, but no proteinuria, may still have typical pathological features of diabetic nephropathy on renal biopsy [25].

Accurately defining clinical phenotypes remains a crucial starting point for studies of the genetics of diabetic nephropathy. Arguably, the natural history of diabetic nephropathy is easier to study in persons with type 1 diabetes versus type 2 diabetes because the age at onset of diabetes is more accurately determined in type 1 diabetes. There is considerable debate as to whether the underlying genetic or pathological mechanisms responsible for diabetic nephropathy in type 1 diabetes versus type 2 diabetes overlap or are distinct [14, 26, 27].

Studying the genetic susceptibility to diabetic nephropathy in persons with type 2 diabetes is particularly challenging since the clinical phenotype is more difficult to define. For individuals with type 2 diabetes, it cannot be assumed that proteinuria and a low GFR indicate the presence of diabetic nephropathy. Renal biopsy studies have highlighted the broad range of renal pathologies present in persons with type 2 diabetes [28]. The discordance between type 2 diabetes and diabetic nephropathy, as a cause of ESRD, was further emphasised in a recent national registry study from Scotland which reported that only 58 % of individuals with ESRD and type 2 diabetes had a diagnosis of diabetic nephropathy [29]. This contrasted with the data from the same registry which indicated that 91 % of individuals with type 1 diabetes and ESRD had diabetic nephropathy. Clinical phenotyping is further complicated by current analyses suggesting that different gene variants contribute to the risks for proteinuria and ESRD in both type 1 and type 2 diabetes [30, 31], with a little overlap observed between genes identified for individual measurements of renal function such as urinary albumin:creatinine ratio (uACR), serum creatinine and serum cystatin C [32, 33, 34••].

Genetic Epidemiology

Several lines of evidence support an inherited genetic predisposition to diabetic nephropathy: only a subset of individuals with type 1 or type 2 diabetes will develop diabetic nephropathy [15••], diabetic nephropathy and ESRD both cluster in families [3537], and the prevalence of diabetic nephropathy varies between ethnic groups [10, 38]. The risk of developing diabetic nephropathy can be reduced but not eliminated by improved control of known risk factors such as hypertension and poor glycaemic control [3941]. The genetic component for diabetic nephropathy (heritability) has been estimated between 0.2 and 0.46 [4245], with one notable study in White individuals with type 2 diabetes reporting an estimated glomerular filtration rate (eGFR) heritability (estimated h 2) of 0.75 after adjusting for age, gender, mean arterial blood pressure, medication and HbA1c [44]. Modifiable, traditional risk factors for diabetic nephropathy include blood pressure, glycaemic control, lipid levels, chronic inflammation, smoking, weight and physical exercise, and it should be highlighted that many of these modifiable risk factors are also influenced by a person’s genetic profile [46]. Other risk factors for diabetic nephropathy are not modifiable such as age, gender, age at onset and duration of diabetes, but these may still influence the future risk of developing diabetic kidney disease, e.g. via gender-specific genetic mechanisms [34••] or longer term epigenetic reprogramming of gene expression associated with age and duration of diabetes [4749].

Candidate Genes

In common with many multifactorial diseases, early genetic studies for diabetic nephropathy focused on candidate genes that had biologically plausible roles in the pathogenesis of this disease. Many genes and single-nucleotide polymorphisms (SNPs) were reported to be significantly associated with diabetic nephropathy; however despite, ‘best practice’ experiments in the 1980s and 1990s, few of these genetic associations were supported by independent replication [50, 51]. Candidate gene studies remain important and are being published, although typically with more comprehensive analysis of biological and/or positional candidate genes, including non-coding regions with putative regulatory functions. Improved efforts have been made to try and identify associated genes that influence diabetic nephropathy with these studies incorporating more stringent quality control, larger samples sizes, discovery with multiple replication cohorts, matched cases and controls, consideration of relevant covariates and ideally genome-wide significance values. Meta-analyses may also help confirm or refute genetic association findings, but these are often challenging to undertake with different statistical tests performed between studies, multiethnic cohorts, genetic and phenotypic heterogeneity, inability to contact authors for primary data and insufficient information reported in publications. More than 200 meta-analyses have been published for diabetic nephropathy, but these often generate conflicting results and are predominantly composed of smaller studies where it is challenging to standardise quality control across all participating studies. Sizeable meta-analyses published in the last 5 years have been recently reviewed in depth, revealing only one gene associated with diabetic nephropathy where P < 0.0001 from targeted studies [50]; the functional SNP rs1617640 in the promoter region of the erythropoietin (EPO) gene, located on chromosome 7q21-q22, was associated with both proliferative diabetic retinopathy and ESRD in multiple populations with diabetes [5254].

Linkage Studies

Taking a genome-wide approach, multiple linkage studies were conducted using multigenerational families or discordant sib pairs to try and localise genetic risk factors for diabetic nephropathy to specific chromosome regions. These microsatellite and SNP-based linkage studies have been previously reviewed, with combined analysis revealing that every autosome (any chromosome that is not a sex or mitochondrial chromosome) has been highlighted with a kidney-related phenotype [51, 55, 56], although the evidence for robust linkage is typically low. Commonly reported genetic regions include the following: 3q13-26, 7p, 6q22-27, 10p11-15, 15q21, 16p11-13 (UMOD), 18q22 (CNDP1, CNDP2), 20q11, 22q (MYH9) [50]. A PubMed search for [diabetic AND (nephropathy or kidney) AND linkage], conducted on 24 January 2015 returns two linkage studies in the past 5 years, both of which involved the multiethnic, multicentre, American Family Investigation of Nephropathy and Diabetes (FIND) collection. In 2011, a genome-wide linkage scan for diabetic nephropathy and uACR was conducted using approximately 4400 autosomal SNPs from Illumina’s Linkage IVb panel in each African-American, American-Indian, European-American and Mexican-American group [57]. Not unexpectedly, results were inconsistent across all ethnicities, but evidence for linkage with diabetic nephropathy, where logarithm of odds (LOD) > 2.5, was observed at chromosome 6p24.3 (LOD 2.84) for European Americans and 7p21.3 for American-Indians (LOD 2.81) [57]. Evidence of linkage with uACR was observed at chromosome region 7q21.2 for European-Americans (LOD 2.96) and 3p13 for African-Americans (LOD 2.76) [57]. A subsequent publication evaluated linkage for eGFR in this population using the same linkage IVb panel, revealing linkage with chromosome 20q11 (LOD 3.34) in Mexican-Americans and 15q12 (LOD 2.84) in European-Americans [58].

Genome-Wide Association Studies

Genome-wide association studies (GWAS) have had a pivotal role identifying SNPs associated with common complex diseases such as diabetes [59, 60], cancer [61, 62] and Alzheimer’s disease [63]. They are relatively cost-effective, readily amenable to automation, technically easy to perform in a high-throughput manner, and software has been developed to facilitate combining data sets genotyped on different platforms from multiple centres. The primary advantage of GWAS is their flexibility to systematically screen common variants across the genome with no prior biological assumptions, although many GWAS arrays now provide an option to add selected SNPs and rare variants to the panel for no or low extra cost. There are a range of arrays available to perform GWAS, with Illumina’s most comprehensive (January 2015) HumanOmni5Exome array providing simultaneous analysis of up to five million SNPs with minor allele frequency >1 % and including exonic variants identified from >12,000 sequenced exomes. A more cost-effective option for large-scale population-based genotyping projects is one of the smaller arrays, such as Illumina’s customisable HumanCoreExome-24 BeadChip, which analyses more than half a million carefully selected SNPs, including 265,919 exome-focused markers. Affymetrix Axiom genotyping arrays also offer competitively priced arrays [64]. Exploiting linkage disequilibrium for efficiently tagged SNPs by subsequent imputation will provide information on more markers and help compare genotype-phenotype data sets across different centres. Carefully designed studies and reporting genetic association results in line with STREGA guidelines [65], and standardised GWAS quality control [66], help improve transparency, interpretation of results and inform downstream studies (Fig. 1). GWAS have proved successful identifying SNPs associated with kidney phenotypes including IgA nephropathy [67, 68], membranous nephropathy [69], focal segmental glomerulosclerosis (FSGS) [70, 71], chronic kidney disease (CKD) [32, 72, 73] and ESRD [74, 75••]. However, progress identifying susceptibility genes from GWAS for diabetic nephropathy has been slow.

Fig. 1
figure 1

Genetic association designs to investigate diabetic nephropathy for either a targeted approach or genome-wide association studies. These study designs are employed to discover genetic risk factors for diabetic nephropathy including discrete traits associated with kidney disease, e.g. proteinuria, rate of decline in eGFR. DN diabetic nephropathy, eGFR estimated glomerular filtration rate, eQTL [134] expression quantitative trait loci, ESRD end-stage renal disease, HaploReg [135] a tool for exploring annotations of the noncoding genome, QC quality control, SNP single-nucleotide polymorphism, SrCr serum creatinine, srCysC serum cystatin C, uACR urinary albumin:creatinine ratio

Multiple GWAS have been performed exploring risk factors for kidney disease in populations with type 2 diabetes, but there has been less enthusiasm compared to type 1 diabetes to perform these studies, largely due to challenges identifying a ‘true’ diabetic nephropathy clinical phenotype (for cases) and appropriate controls despite the much larger number of individuals diagnosed with type 2 diabetes compared to type 1 diabetes (Table 1). Association studies have been combined with the linkage studies previously described, but the genes involved have yet to be identified in the broad, localised chromosome regions [58]. The first large-scale GWAS for diabetic nephropathy was conducted in Japanese individuals with type 2 diabetes in 2005; the ELMO1 gene was identified (P = 000008, odds ratio 2.67, 95 % CI 1.71–4.16) from evaluation of 80,000 gene-based SNPs [76]. There is functional support for ELMO1 associated with diabetic nephropathy, but subsequent studies have generated inconsistent results and a comprehensive meta-analysis has not yet been published for this gene [54, 7781]. Subsequent studies have highlighted PVT1, LIMK2, SFI1, WFS1, FTO, KCNJ11 and TCF7L2 genes, but none approached conventional genome-wide significance [8284].

Table 1 Key genes identified by GWAS and meta-analysis demonstrating association with diabetic nephropathy

The first GWAS for individuals with type 1 diabetes followed the GAMES approach [85] using microsatellites and multiple DNA case-control pools to explore association with diabetic nephropathy [86]. Several genetic regions were highlighted in this low-resolution screen, but no marker provided strong evidence of association [86]. Two GWAS were published in 2009 reporting associations with FRMD3, CARS, CHN2, CPVL, ZMIZ1 and MSC genes, although none reached genome-wide significance and replication has proved challenging [87, 88]. Suggestive trends towards association for FRMD3 have been supported by independent groups [54, 89] and an in silico functional mechanism of action proposed through which a FRMD3 promoter polymorphism influences transcriptional regulation of the bone morphogenetic protein (BMP) signalling pathway [90].

Using the same inclusion/exclusion criteria for diabetic nephropathy phenotype, the GEnetics of Nephropathy—an International Effort (GENIE) consortium performed two novel GWAS on independent collections from the UK and Ireland (UK-ROI) and Finland (FinnDiane) [75••]. These novel GWAS data were combined with GWAS data from the US-GoKinD collection, which was available through dbGAP [91]. All three GWAS underwent consistent quality control and imputation, with meta-analysis of GWAS so that approximately 2.4 million SNPs were evaluated in a total of 6691 individuals for diabetic nephropathy and ESRD [75••]. Selected SNPs that demonstrated preliminary evidence of association in the discovery phase were followed up in 5873 individuals who had similar phenotypic characteristic to the discovery cohorts. SNP rs7588550 in ERBB4 showed the most evidence for association with the primary phenotype of diabetic nephropathy, and this was supported by gene expression data and plausible biological relevance whereby ERBB4 is co-expressed with collagen genes associated with renal fibrosis in the tubulointerstitial compartment of the kidney, although has not yet been widely replicated [75••]. Two independent Japanese groups studying diabetic nephropathy in type 2 diabetes have reported nominal association with this SNP in ERBB4 gene [92], although in the opposite direction of effect to that observed in the GENIE GWAS. Considering the more extreme phenotype of ESRD, rs12437854 was observed with genome-wide significance (P = 2 × 10−9) [75••], but this SNP is located between RGMA and MCTP2 genes, with unknown function, albeit the association has been supported by additional statistical approaches [93]. Of particular interest were several SNPs in the AFF3 gene (rs7583877, P = 1.2 × 10−8 was the most significant), which were associated with ESRD, supported by functional data demonstrating increased gene expression and protein levels in cell models of kidney fibrosis, as well as being involved with the transforming growth factor beta pathway [75••]. A key strength of the effective GENIE consortium was use of harmonised clinical phenotypes that facilitated pooling resources to generate a larger discovery sample size with relatively extensive replication and active engagement by all teams.

The most recently published GWAS for diabetic nephropathy in type 1 diabetes used a discovery collection of 683 cases compared to 779 controls, with first-stage replication in US-GoKinD followed by second-stage replication in FinnDiane and UK-ROI collections [94]. Top-ranked SNPs following discovery and initial replication were in the SORBS1 gene, although no association was observed in FinnDiane and a non-significant trend in the same direction as the discovery cohort was observed in the UK-ROI collection resulting in the most significant SNP from this meta-analysis rs1326934 C allele (P = 0.009, odds ratio 0.83, 95 % CI 0.72–0.96) [94].

Sample size is critically important to ensure GWAS studies are adequately powered to identify risk alleles. For example, an early GWAS in 4921 individuals identified common variants in the HMGA2 gene associated with human height. Meta-analysis of GWAS in 2008 identified twelve loci that explain approximately 2 % of phenotypic variation in height [95]. Just 2 years later, analysis of 183,727 individuals revealed 180 loci, which influence human height and explain ∼10 % of population differences [96]. Most recently (November 2014), 697 variants clustered in 423 loci were identified at genome-wide significance by analysing GWAS data from 253,288 individuals; together, these common variants explain 60 % of the heritability for height [97]. To improve power in identifying variants influencing diabetic nephropathy, a larger JDRF Diabetic Nephropathy Collaborative Research Initiative is currently underway typing Illumina’s HumanCoreExome array with imputation to 1000 genomes and meta-analysis of ∼25,000 individuals with type 1 diabetes for association with diabetic nephropathy.

Extending Typical GWAS

GWAS studies typically analyse autosomes, with the initial quality control steps excluding analysis of the sex chromosomes (X, Y) and the mitochondrial genome. Gender-specific differences are apparent for renal phenotypes, including an increased incidence and prevalence of diabetic nephropathy and ESRD in men [98]. This has led to gender-specific genetic association analyses being performed in men and women separately. Based on meta-analysis of GWAS data, a key SNP, rs4972593 on chromosome 2q31, was associated in women with an odds ratio of 1.81 (95 % confidence interval 1.47–2.24, P = 3.85 × 10−8), yet showed no association in men (P = 0.77), despite 99 % power to identify this association in men [34••]. The observation that chromosome Y is associated with coronary artery disease in British men [99], primarily connected to inflammatory and immunity genes, suggests sex chromosomes are worthy of investigation for diabetic nephropathy.

Mitochondrial dysfunction is evident for diabetic nephropathy [100, 101], with multiple SNPs in genes related to mitochondrial function recently observed to be associated with diabetic nephropathy based on 6819 individuals with type 1 diabetes [102]; of particular interest is the COX6A1 gene located on chromosome 12q24, which was independently identified among top-ranked signals from independent genetic and methylation studies focused on mitochondrial-related genes [102, 103]. Next-generation sequencing of the mitochondrial genome has revealed genetic variants associated in an ESRD population [104], and this approach for the single, <17 kbp mitochondrial chromosome may identify further risk factors for diabetic nephropathy.

Omics

Genetic variation does not function in isolation, and increasingly, researchers are combining multiomic data sets to identify and help explain risk factors for diabetic nephropathy [75••, 105, 106]. Gene expression studies are moving from targeted microarrays towards more detailed RNA-seq approaches [107], which provide very rich data sets that may be exploited in cell-based models, animal studies and human studies. Epigenetic analysis for diabetic nephropathy is increasingly being studied at a genome-wide level [50], with associations reported for post-translational chromatin modifications [108110], non-protein-coding RNAs [111113] and DNA methylation features [47, 103, 114116]. Metabolomic [117119] and proteomic [120122] profiles are revealing intriguing biomarker signatures, but a lack of standardisation in terms of subject recruitment, experimental platforms or analytical approaches makes such multicentre studies challenging. A standardised approach to integrate diverse data sets has not yet been established, but recent large-scale studies suggest that Data-driven Enrichment-Prioritized Integration for Complex Traits (DEPICT) outperformed GRAIL and MAGENTA when prioritising associated SNPs and finding the most likely causal gene(s) based on integrated data, which included extensive expression, protein interactome, reactome, gene ontology and pathway-based analyses [123125].

Next-Generation Sequencing

A complementary approach to increasing sample size is more comprehensive exploration of the genome, primarily to identify less common variants that may have relatively large effect sizes for diabetic nephropathy. Several years ago, the Wellcome Trust initiated a project to sequence 1000 genomes [126], and in 2014, a new UK project commenced, which plans to sequence 100,000 genomes by 2017 and integrate this data with NHS medical records [127]. The primary focus of the 100,000 genomes project by Genomics England is sequencing the genomes of patients with rare diseases and cancer to enhance research and help progress genomic medicine within the NHS; however, multiple groups are taking advantage of the low-cost whole-genome genotyping option to have population-based cohorts sequenced. These resources offer promising opportunities for productive diabetic nephropathy research, although challenges remain in terms of effectively managing the sheer volume of data and how to deal with medically actionable results from an individual’s genome. To date, few next-generation sequencing projects have been conducted for diabetic nephropathy. Initial microarrays, focused on non-synonymous SNPs, did not generate exciting results for diabetic nephropathy [128, 129], but whole-exome sequencing approaches have enabled the identification of additional exonic SNPs and development of high-density exome arrays. Using publicly available whole-exome sequencing data, 31 coding SNPs across selected genes with prior evidence for association with kidney disease were genotyped to reveal exonic SNPs associated with ESRD in individuals with type 2 diabetes (P < 0.05) [130].

Future Directions

Large-scale epidemiological studies have underscored the need for more extensive clinical characterisation of kidney disease in individuals with diabetes to improve the precision of phenotyping for diabetic nephropathy and increase the power of genetic association studies. There is however a trade-off between the gains from precise phenotypes that minimise bioclinical complexity, and reduction in the potential sample sizes that fulfil inclusion criteria. There are three primary approaches by which this may be achieved: (1) recruitment of carefully phenotyped cohorts of individuals with diabetes with long-term follow-up for diabetic nephropathy; (2) utilising other case-control or longitudinal cohorts, which were not specifically designed to examine diabetic nephropathy, but collected information on relevant phenotypes including blood glucose levels, diabetes status and measures of renal function; and (3) large-scale population-based registers such as the 100,000 genomes project [127], the UK Biobank [131], Generation Scotland [132], the Health Retirement Study [133] and GERA cohort (dbGaP Study Accession: phs000674.v1.p1).

Conclusions

Significant progress has been made improving the clinical care of persons with diabetes to reduce an individual’s personal risk of developing diabetic nephropathy. Nevertheless, the rising global burden of diabetes will continue to drive an increased incidence of diabetic nephropathy. Ideally, a combination of clinical characteristics, renal functional measurements and relevant biomarkers would permit a more accurate prediction of the risk of developing of nephropathy and its rate of progression. A key benefit of clinically useful, predictive genetic biomarkers is their potential to identify those individuals at highest (or lowest) risk of diabetic nephropathy before it is clinically apparent, enabling a stratified medicine approach. Cost-effective and individualised clinical care could then be directed to those individuals at the highest lifetime risk of diabetic nephropathy. To realise this ambitious goal, there is an urgent need to improve our understanding of the genetic architecture underlying diabetic nephropathy. An unanswered question for researchers is why some individuals with diabetes develop nephropathy whereas others are protected from this complication. The answers are within the complex and dynamic interactions between genomic risk factors, behavioural traits and environmental stressors.