Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Ulcerative colitis (UC) was described for the first time in 1859 by British physician Sir Samuel Wilks [1]. One hundred and fifty years of scientific studies and careful observation of its clinical manifestations and yet the pathogenesis of this debilitating inflammatory bowel disease (IBD) remains, for a large part, a mystery. Only in the last decade have tremendous improvements been made in the knowledge of the molecular pathways underlying disease onset and chronic inflammation. These are primarily attributable to large-scale genetic studies that aimed to identify genetic risk factors in UC and led to the discovery of specific genes and associated biological pathways contributing to disease susceptibility.

Early UC genetic studies focussed on the major histocompatibility complex (MHC) located on chromosome 6p; linkage and association to this locus represented the first discoveries of genetic susceptibility to UC [2]. Following this came a series of mostly unsuccessful targeted association studies, unsuccessful owing to limited power to detect significant association due to the number of markers and the size cohorts tested. The technical innovations and discoveries that followed the publication of the complete sequence of the human genome set the stage for the advent of genome-wide association studies (GWAS) [3, 4]. Today, recent GWAS, meta-analyses of GWAS and replication studies testing very large patient cohorts obtained through extensive international collaborations have identified 133 genomic regions associated to UC susceptibility, which cumulatively contribute to explain 7.5 % of disease variance [57].

The MHC and UC Susceptibility

In 1967, the MHC region was first identified to be involved in immune-mediated disease when the expression of HLA-B antigens was found to be higher in patients with Hodgkin’s lymphoma [8]. Variations in the MHC locus have since then been associated to autoimmune, inflammatory and also infectious disease, confirming this locus to be of paramount importance in immunity. Traditionally, the MHC is split into class I, class II and class III regions, each class containing multiple genes, including human leukocyte antigen (HLA)-encoding genes as well as non-HLA-encoding genes. The sequencing of the 3.44 Mb MHC region identified 224 different genes [9]. Class I (HLA-A, HLA-B, HLA-C) and class II (HLA-DP, HLA-DQ, HLA-DR) genes are involved in antigen presentation and stimulation of T helper cells (TH cells). Class III region genes encode several proteins with immune functions, including components of the complement system, cytokines and heat shock proteins. The MHC region exhibits striking sequence variability; for example, the HLA-B gene is the most polymorphic gene in the human genome, with over 2,000 alleles already identified in different populations [10]. Another particular feature of the MHC region is the presence of haplotypes with very strong and extensive linkage disequilibrium (LD), which often limit the exact localization of causative association signals in classical genetic studies.

Since the 1970s, several linkage and association studies have tried to verify whether specific HLA alleles are implicated in UC susceptibility. More often than not, these early studies produced conflicting results. This can mostly be explained by inadequate coverage of genetic variability in the region, by different typing methodologies and by low statistical power owing to limited cohort sizes. In 1999, Stokkers et al. published a meta-analysis of several previously reported association studies of HLA alleles, in order to estimate the contribution of the HLA class II alleles to IBD. Specifically, they analyzed 18 association studies of the MHC region in UC [2]. In the pooled results, HLA-DR2, HLA-DR9 and HLA-DRB1*0103 were associated with UC risk, while a protective effect of allele HLA-DR4 was detected. A similar effort published by Fernando et al. a decade later aimed to review 35 years of MHC genetic research and pool the results in an updated meta-analysis [11]. This particular study examined the association of the MHC alleles to six different diseases, and 37 independent studies were considered for UC [2]. This new meta-analysis confirmed the association of UC to HLA-DRB1*0103 and HLA-DRB1*1502 and detected a weak association to the MHC class I related gene A (MICA), which had been previously reported [12]. The study identified novel associations to HLA-DR5, HLA- A19 and HLA-A24.

In 2009, the International MHC and Autoimmunity Genetics Network (IMAGEN) published an extensive association study of the MHC region [13]. A total of 1,472 single nucleotide polymorphisms (SNPs) covering common MHC variation were genotyped in over 10,000 DNA samples from cohorts of patients suffering from one of seven immune-related diseases. The cohort for UC consisted of 667 patients from Italy, and three different signals associated to increased UC risk were identified, namely, in the HLA-DRB1 gene (with HLA-DRB1*1101 possibly responsible for the association signal), the NOTCH4 gene region and a third locus containing the BAT8, C2, RDBP and SKIV2L genes.

As of today, the MHC is still one of the strongest and most consistently replicated association signals in UC. This remains the case even within the context of the more recent genome-wide SNP-based association studies. One of the first in UC was a genome-wide scan of non-synonymous SNPs from Fisher et al. [14]. This study identified a 400-kb haplotype block containing BTNL2 as well as loci HLA-DQA1, HLA-DRA, HLA-DRB5 and HLA-DRB1. The first true GWAS (>100,000 SNPs) of UC to find an association to the MHC was of IBD patients with early-onset disease [15]. Association to the MHC has since been detected in every subsequent GWAS and meta-analyses of GWAS. However, as for many other immune-mediated diseases with association in the MHC, the complexity of the region makes it difficult to pinpoint the actual causal gene(s) or variant(s) for UC.

Limited Success of Early Targeted Associated Studies

Before the GWAS era, association studies to candidate genes outside the MHC were attempted, but they generally resulted in limited success and no locus was consistently replicated. These early association studies were generally performed in small cohorts from different populations and ethnic groups, and the coverage of genetic variability in the queried genes or genomic regions was often low. Two loci that stand out, however, are the ABCB1 locus and IL2-IL21 locus as multiple studies have shown consistent results.

The ATP-binding cassette subfamily B member 1 gene (ABCB1), also known as the multidrug resistance gene 1 (MDR1), encodes P-glycoprotein 170. This transmembrane protein is an efflux transporter pump that regulates the flow of substances in and out of the cell. It is highly expressed in epithelial cells of the intestinal tract and influences the response to many drugs, including glucocorticoids used it the treatment of IBD [16]. Abcb1 knockout mice develop severe spontaneous colitis that can be cured with antibiotics, which suggested that intestinal flora is necessary to maintain inflammation in mouse models of IBD [17]. Several candidate gene studies from the last decade have confirmed that genetic variation in the ABCB1 gene is associated with UC. While a meta-analysis of six targeted association studies published until 2006 concluded that the T allele of ABCB1 SNP rs1045642 is significantly associated with UC risk [18], a more recent meta-analysis of 16 association studies suggested the association with this allele is only marginal [19]. In addition, this locus was not among the 163 genetic risk factors identified in the most recent GWAS of IBD, suggesting that if truly associated with UC, its effect is likely to be quite modest [6].

Before being described in UC, the locus 4q27 had already been characterized as a well-established immune-related locus, because of its association with celiac disease, type 1 diabetes, Grave’s disease, SLE, rheumatoid arthritis and psoriasis [2024]. This locus contains the cytokine IL2 and IL21 genes, which are both functionally interesting candidates. IL21 is involved in the differentiation of TH17 cells implicated in the pathogenesis of IBD [25], and its expression is increased in intestinal tissue from UC patients compared to controls [26], while il2 knockout mice develop an IBD-like condition [27]. The first study to implicate the IL2-IL21 locus in IBD susceptibility was published by Festen et al. [28], who reported the genotyping of four SNPs in a three-stage association study on a total of 3,195 UC cases from North America, the Netherlands and Italy. Association signals at all four SNP sites reached genome-wide significance. The identified locus covers a 480 kb region of high linkage disequilibrium (LD) and contains IL2, IL21 and two other genes, namely, TENR and KIAA1109, which are less plausible biological candidates to play a role in UC. Soon after, the associations were replicated in German and Spanish studies [29, 30] and later replicated in a large meta-analysis of GWAS [5]. Additionally, the locus is also associated in Chinese populations where LD is less extensive than in Caucasian populations, allowing for the identification of two apparently independent association signals for the IL2 and IL21 genes [31].

The GWAS Revolution

The limited success of targeted gene-candidate association studies made it clear that a new strategy would be needed to allow for the identification of genes underlying this complex disease. Success in hypothesis-driven association studies, such as gene-candidate studies, is restricted by the limited knowledge of disease molecular pathogenesis. Early association studies were also plagued by the low statistical power of small cohorts to identify risk genes with modest effect size, which often resulted in false-positive findings. Knowledge of the complete sequence of the human genome, availability of markers covering most of the common variation in the human genome and microarray technology for mass parallel genotyping allowed the introduction of GWAS. Access to large cohorts and stringent statistical methods contributed to the success of GWAS in identifying complex disease genes by international teams of collaborators.

Since 2008, five groups published independent GWAS on UC [14, 15, 3238]. The findings from these GWAS and from their first comprehensive meta-analysis (Table 6.1) recently led to identification of 47 loci associated with UC susceptibility [5]. This number has recently been brought to 133 (23 UC-specific loci and 110 loci associated with both CD and UC), thanks to a major collaborative effort from the International IBD Genetics Consortium (IIBDGC), who studied genetic information coming from a total of 38,565 IBD patients (17,865 UC and 20,700 CD) and 37,747 controls from 17 participating countries around the world [6].

Table 6.1 Genome-wide association studies of ulcerative colitis

Historically, the first GWAS in IBD was performed in 2006 for Crohn’s disease, when CD-risk variants in the interleukin 23 receptor gene (IL23R) were identified [39]. Replication was performed in cohorts of both CD and UC patients, and IL23R was found to be a gene common to both IBD subphenotypes [39], with the coding variant Arg381Gln conferring protection against both CD and UC. IL23R represents one of the first genes, outside of the MHC, to be successfully associated to UC, and has opened up new avenues for potential therapeutic exploitation (see “Chapters 11 and 15” for more detail).

The first large-scale association study specifically targeting UC was published in 2008 by a UK group [14], who studied 10,886 non-synonymous SNPs and MHC tag SNPs. The screening cohort was composed of 905 UC patients recruited in the UK and 1,465 controls from the 1958 British Birth Cohort [40], and replication was performed in an independent cohort of similar size and origin. This scan resulted in the identification of three UC loci, namely, the MHC region (which confirmed previous findings); the gene encoding the extracellular matrix protein 1 (ECM1), expressed in the small and large intestine and involved in NF-κB activation [41]; and the macrophage stimulating gene MST1, previously associated to CD and UC [42] and confirmed as a CD locus [43].

The first true GWAS targeting the entire genome in UC was published in 2008 by a German group [37]. The genotyping of 355,262 SNPs (post-quality control) was performed in a screening cohort composed of 1,167 UC cases from Germany and Norway and 777 controls, while three replication cohorts included patients from Germany, the UK and Belgium, for a total of 1,855 UC and 3,091 controls. Again, association signals in the MHC region were detected, together with additional independent loci containing the anti-inflammatory interleukin 10 (IL10) gene and the ARPC2 gene, respectively. This group also reported an analysis of 50 Crohn’s disease risk loci in UC [44], which revealed variants in BSN, IL12B, NKX2-3, PTPN2, KIF21B, CCNY, HERC2, STAT3 and 10q21.2 (intergenic) loci to be relevant also to UC.

The second GWAS in UC was performed by a North American group [34]. Genotyping for the screening stage of this study was performed on 1,052 UC patients and 2,571 controls from Caucasians of European ancestry, while the replication phase was performed on cohorts from North America (768 UC patients and 721 controls) and Italy (619 UC and 394 controls). Confirming the association with MHC and IL23R, this GWAS also identified two new UC loci: 1p36 (RNF186, OTUD3, PLA2G2E) and 12q15 (INFG, IL26, IL22).

The third UC GWAS was performed by the UK IBD Genetics Consortium [35], who genotyped a total of 4,682 UC patients and 10,235 controls and identified three new loci on 20q13, 16q22 and 7q31. Each of these loci contains candidate genes that are biologically relevant in the context of inflammation. Locus 20q13.12 maps to the 3′ untranslated region of gene hepatocyte nuclear factor 4α (HNF4A), which is involved in cell adherence, and rare variants have been associated to maturity-onset diabetes of the young [45, 46]. Locus 16q22 encodes several genes including CDH1, the gene encoding glycoprotein E-cadherin that is involved in cell adhesion in the intestinal epithelium and whose expression is reduced in the inflamed colon of UC patients [47]. The strongest candidate gene for locus 7q31 is LAMB1 encoding laminin β1 subunit. Laminins are present in the lamina propria of the intestines where they help in anchoring the epithelial layer, and expression of laminins is decreased in the colonic mucosa of UC patients [48]. While this study also replicated previously associated UC loci (including 1p36, the MHC, IL23R, MST1 and NKX2-3), further data from the UK IBDGC group implicated nine CD loci in UC including KIF21B, IL18RAP, IL12B, JAK2 and STAT3 [49].

The fourth UC GWAS (and the first executed in a non-Caucasian population) was performed by a Japanese group [36] on a total of 1,384 cases and 3,057 controls (index and replication cohorts), with three additional UC loci identified: FCGR2A, 13q12.13 (intergenic) and SLC26A3 (7q31). The FCGR2A gene encodes a member of the family of immunoglobulin Fc receptor genes that is expressed on platelets, macrophages and neutrophils where it is involved in the phagocytosis of IgG-coated particles [50]. The SLC26A3 gene is located in locus 7q31, which was previously identified by the UK group, but the association signal appears to somewhat differ in the two populations (stronger for LAMB1 and SLC26A3, respectively, in the UK and Japanese cohorts).

In 2010, the German group carried out an extension of their first GWAS [33], with the goal to increase coverage of the genome-wide variability by using a combination of a new genome-wide SNP genotyping chip (the Affymetrix 6.0) that allowed more SNPs to be studied and of novel imputation approaches. In addition to the known MHC signal, association to new loci, IL17REL and 7q22 (near SMURF1, KPNA7), was identified. IL17REL encodes interleukin 17 receptor E-like, a poorly characterized member of the IL17 receptor gene family with no known ligand, which may bind specific IL17 cytokines to drive TH17 inflammatory responses [51]. In an effort to further characterize the function of these loci in UC pathogenesis, the authors studied expression quantitative trait loci (eQTLs) by looking at expression signatures related to the associated SNPs at both loci. Differential immune gene expression was detected, with IL17REL variants modulating the expression of the IL17RE receptor, the cytokine CSF3 and the modulator of T cell response CD276, while alleles at SNP rs7809799 in the 7q22 locus modulated the expression of the cytokine IL1F10, the transcription regulator of B cell development FOXP1 and the MHC-associated gene BTN3A.

Early-Onset Ulcerative Colitis

In an effort to identify modest effect genes that had not been identified in previous adult-onset studies, a large collaborative GWAS of early-onset IBD was performed [15]. The rationale presented was that it should be easier to identify signals previously missed in (predominantly) adult-onset studies because of the clearer ascertainment of patients (early-onset IBD patients usually present extensive colitis) and stronger family history in early-onset IBD. All patients were diagnosed before turning 19 years old, were of European ancestry and were ascertained in various centres across the USA and one centre in Rome, Italy. The genotyping of the screening cohort (1,011 early-onset IBD patients including 317 UC patients) and replication of association signals led to the identification of two new IBD loci on chromosome 20q13 and 21q22. These loci were found as general IBD loci as they were found to be associated to both CD and UC. The 20q13 locus harbours several candidate genes, among which the authors highlighted TNFRSF6B because of the importance of the TNF pathway in IBD pathogenesis [52] and their detection of higher TNFRSF6B expression in colonic biopsies in relation to mucosal inflammation. The 21q22 locus is intergenic but the nearest gene codes for the proteasome assembly chaperone 1 (PSMG1), which showed a modestly increased expression in IBD patients compared to control.

A follow-up early-onset IBD GWAS was also published by the same group [38], including genotyping of adult-onset loci in their cohorts. While no new UC risk locus was identified in the study, this effort led to the association of two CD loci with UC: ICOSLG (21q22) and ORMDL3 (17q12). ICOSLG is the ligand of ICOS expressed on activated T cells [53], while ORMDL3 is involved in cellular Ca2+ homeostasis and appears to be a general inflammation and multi-diseases locus, as it was previously associated to asthma [54].

Meta-analyses: The Power of Many

The next phase in the evolution of association studies in UC was underlined by the activities of the IIBDGC. This consortium comprises several research groups from 16 different countries, who joined forces by sharing data and expertise in order to increase cohort size and improve statistical power to detect modest effect loci in large meta-analysis efforts [55].

The first meta-analysis in UC pooled GWAS data from the previously mentioned North American study, along with new North American and Swedish GWAS [32]. This first UC meta-analysis included a total of 2,693 UC patients and 6,791 controls for the discovery phase, as well as 2,009 UC patients and 1,580 controls for the replication phase. A total of 13 loci achieved genome-wide significance including four new loci: 1q21 (FCGRA2—previously identified in the Japanese GWAS), 2p16 (REL-PUS10), 17q12 (ORMDL3) and 5p15. Locus 2p16 contains the RNA chaperon gene PUS10 and the gene encoding C-Rel (REL), one of the NF-κB proteins, while for the 5p15 locus, the nearest gene CEP72 encodes a regulator of microtubule organization during mitosis [56]. Additionally, 14 loci that were previously either formally or suggestively identified were replicated in this study, including IL23R, IL12B, TNFRSF15, IBD5 and IL10. Ten traditional CD loci were also identified in this UC meta-analysis including IRGM, STAT3, CCL2-CCL7 and 21q21. Following the publication of this first UC meta-analysis, the authors estimated that less than 10 % of UC genetic variance could be explained by the associated loci known at the time [32].

The GWAS meta-analyses of CD and UC that followed had a dramatic impact on our knowledge of IBD genetic risk factors and again are the result of the efforts of the IIBDGC. Specifically, at the beginning of 2011, there were 18 loci significantly associated with UC identified by a few different independent studies. The same year, an IIBDGC meta-analysis of all these studies, including a total of 16,000 UC cases and 32,000 controls, discovered 29 additional loci, bringing the number of known UC loci to 47 and the estimated heritability they explain to 16 %. In order to identify relevant causal gene(s) at each locus, the authors used literature mining (GRAIL), eQTL databases, 1,000 genome sequencing data to search for correlated non-synonymous SNPs and physical proximity within the loci. Only a year later, the IIBDGC expanded the breadth of these analyses reporting one of the largest meta-analyses ever performed, based on 75,000 IBD cases and controls and including data from 15 different GWAS and additional typing on the immunochip, an array specifically designed to capture variation at 200 known risk loci for 12 common autoimmune diseases [6]. In this study, 71 new causative regions were identified, which brought the total tally to 163 independent IBD risk loci, far more than reported for any other complex disease. Of these, 110 appear to be relevant to both Crohn’s disease and ulcerative colitis, while 23 show risk effects that are UC specific (the remaining 30 are CD-only loci). Fundamental contribution from this study was also the realization that a large portion of IBD risk loci are shared with other complex immune-mediated diseases (particularly ankylosing spondylitis and psoriasis), primary immunodeficiencies and mycobacterial disease, pointing to an essential role for host factors involved in defence against infection in IBD.

In addition to identifying hundreds of regions associated with increased IBD risk, most recent meta-analyses have also provided key information as to the potential biological pathways involved in UC pathogenesis. Among other methods, functional annotation of the identified UC genes allows to cluster them in pathways and clarify the molecular origin of the disease. In the latest meta-analysis from IIBDGC, where the highest number of UC loci was identified, the most significantly enriched Gene Ontology [57] term was regulation of cytokine production, specifically IFN-γ, IL-12, TNF-α and IL10 signalling. T, B and NK cell activation was the next most significant, and strong enrichment was also seen for response to molecules of bacterial origin and for the JAK-STAT signalling pathway. These pathways add to those already suspected to be relevant from individual gene functions such as, for instance, transcriptional regulation (PRDM1, IRF5 and NKX2-3) and intestinal barrier functions (GNA12, HNF4A and LAMB1). A list of prioritized candidate UC-risk genes and their functional properties is reported in Fig. 6.1.

Fig. 6.1
figure 00061

UC risk genes grouped by function. The list of genes and corresponding biological functions is based on the prioritization at their respective risk loci based on the results included in the most recent UC meta-analysis [6]

Many UC Genes Are General IBD Risk Factors

Overlapping phenotype and aggregation of both CD and UC in some families have suggested that part of the genetic risk is common to both diseases. Many genetic studies have tested the association of Crohn’s disease loci to UC and vice versa (as mentioned earlier in this chapter). The large meta-analyses of CD and UC, respectively, by Franke et al. and Anderson et al. enabled one of the first systematic analyses of loci shared by these two diseases. Specifically, Anderson et al. identified common loci by comparing the results of these large GWAS studies in CD and UC: common loci were defined when genome-wide significant loci for one disease (P < 5 × 10−8) reached P < 1 × 10−4 in the other disease, and at least 28 shared loci were identified by this means [5]. Among those, many are involved in T cell differentiation (e.g. cytokines IL21, IL10, IFNG and cytokine receptor IL7R). Some of them are more specifically associated with the IL23R pathway (IL23R, JAK2, STAT3, IL12B and PTPN2), which is involved in the maintenance of TH17 cells and has been involved in several other diseases (see “Chapters 7 and 11”). TH17 cells are thought to coordinate defence against specific pathogen and mediate inflammation [58], and the original identification of IL23R as an IBD risk factor shattered the paradigm of CD and UC being primarily TH1- and TH2-mediated diseases, respectively [39]. Genes involved in TNF signalling are also well represented among common IBD genes (TNFRSF9, TNFRSF14 and TNFSF15), and they encode proteins with various immune effects including systemic inflammation and activation of inflammatory transcription factor NF-κB. As mentioned, the latest meta-analysis has brought the number of common CD-UC shared loci to 110 [6], while 23 appear to be the risk loci with UC-specific genetic effects. Interestingly, however, most of these UC-specific genes show the same direction of effect in Crohn’s disease, suggesting that nearly all of the biological mechanisms involved in one disease have some role in the other. One intriguing exception is NOD2, which still represents the strongest causal gene in CD but shows significant protective effects in UC, an observation that may reflect biological differences between the two diseases.

Associated Loci Differ Between Ethnic Groups

With the exception of a Japanese study [58], all UC GWAS carried out to date have been performed on Caucasians of European descent. Considering the interethnic differences in genomic architectures that reflect genetic drift, mutations and evolutionary factors, it is likely that some risk variants may be relevant in some populations but not others. One of the ultimate goals of current genetic studies is their exploitation for the design of novel therapeutic approaches, and it is therefore crucial to understand what portion of the genetic risk is detectable and hence also relevant in individual populations. The increasing number of association studies performed in cohorts of individuals of non-European descent, some of these GWAS, allows for some initial information to be evaluated in this context. The risk gene IL23R, for example, might not be associated in all ethnic groups, as it was not detected in the Japanese GWAS of UC [36] and in a targeted association study based on a Korean case-control cohort [59]. This may be at least in part due to the fact that some European IL23R variants are less common or absent in the Japanese population, like in the case of the protective IL23R 381Q allele (1 % compared to 7 % in Caucasians). Evidence for similar interethnic differences exists for other IBD loci, for instance, NOD2 and ATG16L1 [60, 61, 62], and large-scale analyses are currently at the core of IIBDGC interethnic studies that aim to shed further light on this important question.

Many Risk Factors Are Still to Be Discovered

Surprisingly, the total number of 133 UC risk loci (a number by far higher than in most other complex diseases) is still estimated to cumulatively account only for a minor portion of disease variance, namely, 7.5 % [6]. It should be noted that several additional loci narrowly miss the genome-wide-significance threshold of significance (P < 5 × 10−8) and thus are not considered to be validated loci; however, this observation suggests that other risk loci are still to be discovered. It has been proposed that additional loci or causative rare variants may not be captured by GWAS platforms that primarily test common variants. Deep re-sequencing of risk loci is a way to identify these rare variants, and next-generation sequencing (NGS) technologies have greatly reduced per sample costs and process time, thus allowing such studies to be feasibly performed on adequate numbers of cases and controls. The first large-scale re-sequencing study performed in IBD was published in October 2011 [63]. Fifty six CD loci were re-sequenced in two DNA pools from 350 CD cases and 350 healthy controls for variants discovery, and follow-up genotyping of 115 low-frequency SNPs (non-synonymous, nonsense or splice-site variants) was performed in 16,054 CD and 12,153 UC patients and 17,575 healthy controls. The study identified a protective splice-site variant in the CARD9 gene, and additional IBD susceptibility variants were also detected in IL23R, CUL2, PTPN22 and C1orf106. In total, the authors estimated that 1–2 additional percent of CD heritability (and likely less for UC) is explained by these rare variants. These findings appear to be very modest but they give an important insight into complex disease genetics, where common variants of modest effect and rare variants of higher penetrance coexist in the same genes. Additional UC-focussed re-sequencing efforts will likely allow for the identification of other rare risk variants that may help to both pinpoint exact causative genes (in multiple-gene loci) and improve our understanding of the functional consequences of genetic variation at specific UC risk loci.

The Future of UC Genetics

The last few years have seen tremendous progress and important discoveries in UC genetics; however, most of the known disease heritability is still to be explained. As mentioned, one favoured hypothesis is that the remaining heritability is to be found in new risk loci and in additional rare variants located in known and yet to be identified UC loci. One of the obstacles to these discoveries is the statistical power that will be required, given that current loci have required tens of thousands of samples. Additional independent GWAS and the pooling of these new GWAS into increasingly bigger meta-analyses will certainly lead to the identification of new loci. Re-sequencing studies that aim to identify rare causal variants will be essential companions to these GWAS, and the analysis of news markers such as copy number variations (CNVs) will hopefully provide additional insight into disease heritability. In addition, long-distance regulatory variants need to be more carefully evaluated given that some associated loci are gene deserts, as well as individual SNPs for their potential role in altering epigenetic signatures in DNA methylation, histone acetylation, microRNA binding, etc. Genotyping cohorts of different ethnic background will be helpful in fine-mapping-associated loci. Genetic variation patterns are often different in different populations, and, for example, African populations have more diversity and the linkage disequilibrium blocks are generally shorter than in population of European ancestry, thus allowing for the identification of new or refined association signals.

Finally, there is great need for integrative genomic and system biology approaches where multilayer data are collectively analyzed and exploited to decipher the mysteries of the genome. Together with the avalanche of data coming from several GWAS and their meta-analysis, the considerable challenge of having to understand the role of each and individual genetic risk factor from hundreds of loci has emerged. In this endeavour, large-scale functional studies into the transcriptome and the proteome are now indispensable. The amount of work still ahead in UC genetics seems to be no less than what it was some years ago, but we certainly now have a very solid foundation of knowledge upon which we can build.