Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Heritability of Celiac Disease

Celiac disease (CD) can be considered as a model for common complex disorders in which the phenotypes result from a combination of environmental triggers, genetic predisposing factors, and their interactions. Although the gliadin fraction of wheat gluten and similar protein fractions of other grains, which are the primary and necessary environmental triggers for CD, have been well defined since the 1950s, the discovery of genetic risk factors is an ongoing process.

In contrast to Mendelian single-gene disorders, CD does not show a clear pattern of inheritance. The importance of genetics is indicated by familial clustering, shown by epidemiological studies comparing the prevalence of CD in related individuals to the prevalence in unrelated individuals. The reported prevalence in first-degree relatives of CD patients ranges from 2.8 to 22.5 %, with the higher prevalences reported in at-risk relatives undergoing routine intestinal biopsy instead of only serological screening [1]. In contrast, the overall prevalence of CD in North America and Western Europe ranges from 0.5 to 2.9 % [1, 2]. The recurrence risk ratio for siblings (λs), the prevalence of CD in siblings divided by the prevalence in the general population, has been reported to be as high as 20–60 [35]. A large multicenter study in the USA among at-risk and not-at-risk groups found a prevalence of 4.5 % in first-degree relatives and 2.6 % in second-degree relatives, compared to 0.3 % in not-at-risk children and 0.9 % in not-at-risk adults [6].

In order to distinguish the role of genetic factors in familial clustering from environmental factors, twin studies are performed. In a large population-based twin study in Italy, the estimated case-wise concordance, a measure for disease risk if a co-twin is affected, was 83.3 % (95 % confidence interval 70.3–96.4 %) for monozygotic twin pairs and 16.7 % (3.6–29.8 %) for dizygotic twin pairs [7]. Since the proportion of affected co-twins in dizygotic twin pairs is in line with the reported prevalence of CD in siblings, the role of shared environmental factors, with the exception of exposure to gluten antigen, is thought to be limited [8]. Assuming a population prevalence between 0.1 and 1.1 %, the variance in CD prevalence attributable to genetic variance, the so-called heritability of CD, is estimated to be 57–87 % (95 % confidence intervals 32–100 %) [7].

The Immunogenetics of Celiac Disease

Identification of Susceptibility Genes for Celiac Disease

The above studies on CD prevalence contribute to estimations of the heritability, but do not provide information on which genes or how many genes are actually involved in disease development. Other approaches are needed to identify susceptibility genes: linkage analysis and genetic association analysis (both candidate genes and genome-wide) (see Text Box 5.1 Genetic Linkage and Association analysis).

The first indentified genetic risk factor for CD, the human leukocyte antigen DQ (HLA-DQ) genotype, is the strongest known genetic risk factor. Since serological studies in the 1970s discovered the association between HLA and CD, many others have confirmed the strong linkage to HLA, specifically to HLA-DQ2 and HLA-DQ8. However, no other genetic associations were consistently found by linkage studies and candidate gene studies. The recent development of genome-wide association studies (GWAS) has led to the discovery of several additional susceptibility genes for CD. So far, GWAS have identified 57 associated single nucleotide polymorphisms (SNPs) located in 39 non-HLA regions, with most of the positional candidate genes having immunological functions.

Association with HLA Genotype

The major HLA class II, also called Major Histocompatibility Complex II (MHC II), molecules DP, DQ, and DR are cell-surface receptors on antigen-presenting cells involved in the presentation of exogenous peptide antigens to T-helper lymphocytes. The encoding genes are part of the 200 genes encompassing 4 Mb HLA-complex on chromosome 6p21. This region corresponds with the CELIAC1 locus, a region consistently found to be associated with CD in both linkage and association studies.

The first reports on the association with HLA revealed a link between CD and positive serology for HLA class I B8-antigen and later HLA-DR3-antigen [9, 10]. The encoding alleles of the HLA-B gene and HLA-DR genes are strongly linked in the haplotype A1-B8-DR3-DQ2, which is present in approximately 10 % of Northern Europeans [11]. Subsequent studies pinpointed the association of CD to alleles encoding HLA-DQ2 molecules [12, 13].

The HLA-DQ molecule is a heterodimer consisting of an α chain and a β chain, encoded by HLA-DQA1 and HLA-DQB1. Further characterization of the association between HLA-DQ and CD showed that especially homozygosity for the HLA-DQ2.5 heterodimer (encoded by the alleles DQA1*0501 and DQB1*0201) and, to a lesser degree, heterozygosity for the HLA-DQ2.5 heterodimer combined with the HLA-DQ2.2 heterodimer (encoded by the alleles DQA1*0201 and DQB1*0202) were associated with a strongly increased susceptibility for CD [14, 15] (Fig. 5.1). Assuming a CD prevalence of 1 % in the general population, the absolute risk for CD is estimated at >7 % for this high-risk group [16].

Fig. 5.1
figure 1

HLA-DQ heterodimers with coding HLA-genotypes and corresponding susceptibility for CD

Heterozygosity for the HLA-DQ2.5 heterodimer combined with another HLA-DQ heterodimer, or homozygosity or heterozygosity for the HLA-DQ8 heterodimer (encoded by the alleles DQA1*0301 and DQB1*0302), confers a more moderately increased risk for CD, with an absolute risk for CD estimated at 0.1–7 % [16]. Functional studies showed that these HLA-DQ2 molecules and, to a lesser degree, HLA-DQ8 molecules have a high affinity for gluten peptides and that gluten-reactive T lymphocytes from the small intestinal mucosa of CD patients preferentially recognize gluten peptides when presented by HLA-DQ2 or HLA-DQ8 [1720]. Approximately, 95 % of the CD patients carry the HLA-DQ2.5 genotype, and many of the other 5 % carry HLA-DQ8 [13, 21, 22]. There is a significant worldwide correlation between the combination of wheat consumption and frequency of HLA-DQ2 and HLA-DQ8, on the one hand, and the incidence of CD, on the other hand (estimated correlation coefficient R 2 = 0.4) [23]. This observation is in line with a CD model of genetically susceptible individuals in whom dietary gluten triggers intestinal inflammation.

Although the presence of either HLA-DQ2 or HLA-DQ8 can be considered necessary for the development of CD, neither is sufficient, since only some 3 % of the approximately 40 % of Caucasians who carry either HLA-DQ2 or HLA-DQ8 will actually develop CD [24]. This suggests that, in addition to wheat gluten consumption and HLA genotype, other environmental and genetic factors must be involved in CD etiology. A recent study on the HLA-complex in CD showed that this region, with its many genes in strong linkage disequilibrium (LD), might contain more susceptibility genes for CD [25].

Based on the assumption of a multiplicative model of genetic risk and an estimated recurrence risk for siblings stratified for HLA genotype (λsHLA) of 2.3–5.3, the contribution of HLA to the heritability of celiac disease is estimated at between 21 and 44 % [4, 5].

Association with non-HLA Genes

With the completion of the human genome sequence, millions of SNPs have been identified. Using these SNPs as genetic markers, called tag SNPs, GWAS have helped to identify thousands of susceptibility variants for hundreds of complex diseases.

Two GWAS on CD and their follow-up studies revealed 26 non-HLA regions to be associated with CD [2629]. Denser genotyping with the Immunochip, a custom-made array that covers common variants from 186 GWAS loci associated with 12 immune-mediated diseases, identified 13 additional non-HLA regions [30].

Thus, in addition to HLA, there are now 39 known non-HLA regions associated with CD, of which 36 have been genotyped with a high variant density. These regions contain 57 independently associated SNPs [30] (Fig. 5.2). The association signal can be refined to a single candidate gene for 19 of the regions (see Text Box 5.2. “From Genetic Markers to Candidate Genes and Pathways”). However, only three of the associated SNPs are linked to protein-altering variants located in exonic regions, and eight additional SNPs are localized upstream around the transcription start site (5′ untranslated region) of a specific gene or downstream around the 3′ untranslated region [30]. Although most SNPs are localized in nonprotein coding intergenic and intronic regions, the regions associated with CD are greatly enriched with regions involved in regulating the expression of one or more genes, so-called expression Quantitative Trait Loci (eQTLs) [29].

Fig. 5.2
figure 2

Manhattan plot of the 39 non-HLA regions associated with CD and identified by Immunochip analysis. The vertical line represents the genome-wide significance threshold at P = 5 × 10−8. For each associated region the candidate genes are shown with the minor allele frequency (MAF) in the European control population and the odds ratio (OR) for the most significantly associated single nucleotide polymorphism (SNP) of each region. In three regions, the most significantly associated SNP is linked to a protein-altering variant (IRAK1, SH2B3, and MMEL1, in bold). Several SNPs are associated with a change in expression of one or more genes: ↑ for increased expression and ↓ for decreased expression (Kumar V et al., unpublished data). Candidate genes known to be involved in immunological pathways are highlighted

The candidate genes for CD identified by GWAS provide important clues to the disease pathogenesis, including the pathways that are deregulated in CD. Pathway enrichment analyses using susceptibility genes have shown that most susceptibility genes for celiac disease are involved in immune processes. These pathways concern both the adaptive immune response and the innate immune response. The adaptive immune response includes T-lymphocyte maturation and differentiation (e.g., the RUNX3, ETS1, IL2, IL21, IL12A, IL18R1, and IL18RAP genes), T- and B-lymphocyte activation and immune cell signaling (e.g., the CD28, CTLA4, ICOS, ICOSLG, PTPN2, SOCS1, SH2B3, UBASH3A, and FASLG genes), and chemokine-induced cell migration (e.g., the CCR1-3 and CCR4 genes). The innate immune response includes the NFκB-pathway (e.g., the UBE2E3 and TNFAIP3 genes) and the response to viral infections (e.g., the BACH2 and IRF4 genes) [29, 31] (see Fig. 5.2).

These results do indeed suggest that CD is a T lymphocyte-mediated immune disorder. Furthermore, enrichment of genes involved in natural killer (NK) cell-activation and interferon-gamma production compared to other autoimmune diseases indicates involvement of this pathway in CD [23]. Altogether, celiac susceptibility genes appear to be involved in both the adaptive as well as innate immunity. It has been suggested that the interplay between innate and adaptive immunity on exposure to environmental triggers is a main pathogenic factor in CD [32]. In addition to gluten, various infectious agents have been proposed as triggering environmental factors in genetically predisposed individuals. Moreover, it has been hypothesized that gut microbiota may play a role in CD pathogenesis (as discussed in depth in Chap. 7) [33, 34]. Hence, it might be relevant to analyze CD susceptibility genes in the context of interactions between microbes and the host immune response.

Current Challenges in the Search for Genetic Susceptibility Factors

Although GWAS have identified 57 independently associated non-HLA SNPs to CD, the exact causal variant in each region is still unknown. This can be explained by the fact that the analyzed SNPs are in fact tag SNPs, which are in linkage with more than one variant within a so-called LD block. Molecular functional analyses will be necessary to delineate the true causal variants and to understand the mechanism of how these variants affect CD. In addition, most of the associated SNPs are common and associated with a modest increase in CD risk (median odds ratio 1.17, range 1.10–1.70) or a modest decrease in risk (median odds ratio 0.88, range 0.71–0.91) [30]. Since the associated SNPs are tag SNPs, the true effect sizes of the causal variants may be underestimated.

The known non-HLA susceptibility regions, together with HLA, explain approximately 54 % of the heritability of CD [35]. So nearly 50 % of the heritability still needs to be explained. It is not clear whether this hidden heritability is due to thousands of common variants with smaller effect sizes or to individual mutations with strong effect sizes. Future studies performing whole genome sequencing in CD patients may provide answers to this question [36].

From the ENCODE project, it is now apparent that most of the human genome is transcribed to not only protein-coding transcripts but also to large numbers of noncoding RNA molecules of different size [37]. These noncoding RNAs include short noncoding RNAs such as microRNAs (miRNAs), small interfering RNAs (siRNAs), and piwi-interacting RNAs (piRNAs), as well as a new class of long noncoding RNAs (lncRNAs), which are larger than 200 nucleotides [38]. Interestingly, several CD-associated SNPs map to lncRNA regions [39], and it has been shown that disease-associated SNPs can alter lncRNA expression [40]. Hence, the identification of genetic variants associated with CD and that map to noncoding transcripts could help us not only to explain the hidden heritability of CD but also to better understand the disease mechanism.

Shared Immunogenetics with Immune-Related Diseases

CD is associated with several other autoimmune diseases and immune-mediated diseases. Patients with type 1 diabetes mellitus and autoimmune thyroiditis (Graves’ disease and Hashimoto’s disease) belong to the high-risk populations for CD, with estimated prevalences of 3–6 % in type 1 diabetes patients [1] and 3–8 % in autoimmune thyroiditis patients [41]. In addition, the prevalence of immune-related diseases, including type 1 diabetes and autoimmune thyroiditis, is increased in CD patients compared to the general population [4245]. In different cohorts of CD patients, the overall prevalence of autoimmune diseases, excluding dermatitis herpetiformis, was approximately 20 %, compared to approximately 11 % in control groups [4446]. In a retrospective study of CD patients, their cumulative risk for autoimmune disease increased from 8 % at 15 years of age to 33 % at 50 years of age. Type 1 diabetes comprised 29 % of the reported cases and autoimmune thyroiditis 26 %, while other diseases such as psoriasis (8 %), inflammatory bowel disease (7 %), and rheumatoid arthritis (3 %) were reported in fewer patients [45].

Risk factors for the development of another autoimmune disease were a positive family history for autoimmune disease (hazard ratio 2.4, 95 % confidence interval 1.7–3.3) and a diagnosis of CD before 36 years of age (hazard ratio 2.7, 95 % confidence interval 1.8–3.9). In contrast, a positive family history for CD was not associated with an increased risk for other autoimmune diseases in CD patients [45]. This suggests that some susceptibility genes for CD may be shared with other immune-related diseases.

Many immune-related diseases have been linked to the HLA region. The CD HLA risk haplotypes containing HLA-DQ2.5 and HLA-DQ8 also belong to the most susceptible HLA haplotypes for type 1 diabetes [47]. It is estimated that approximately one-third of the type 1 diabetes patients who are homozygous for HLA-DQ2.5 have CD-associated transglutaminase autoantibodies, compared to 1 % of the patients without HLA-DQ2 or HLA-DQ8. It is therefore likely that CD and type 1 diabetes share more risk factors [48].

A meta-analysis on genetic susceptibility regions discovered by GWAS for immune-related diseases (including CD and type 1 diabetes) showed that 44 % of the regions were associated with more than one immune-related disease. This confirms a widespread sharing of non-HLA susceptibility regions between immune-related diseases [49]. Of the non-HLA susceptibility regions for celiac disease, 30/39 (80 %) have been associated with other immune-related diseases (Fig. 5.3).

Fig. 5.3
figure 3

Non-HLA susceptibility regions shared between celiac disease (CeD) and the following immune-related diseases (using a genome-wide significance threshold at P = 5 × 10−8): CrD Crohn’s disease, GD Graves’ disease, MS multiple sclerosis, PBC primary biliary cirrhosis, PS psoriasis, RA rheumatoid arthritis, SLE systemic lupus erythematosus, T1D type 1 diabetes mellitus, UC ulcerative colitis, VL vitiligo, AA alopecia areata, AS ankylosing spondylitis. Adapted from [59]

Analysis of pathways shared by candidate genes directly associated with two or more immune-related diseases showed that there are three major immunological pathways involved: T-lymphocyte differentiation, immune cell signaling, and the innate immune response [50]. CD susceptibility genes appear to be linked to pathways that are strongly involved in T lymphocyte-mediated autoimmune diseases, such as type 1 diabetes and autoimmune thyroiditis, but not in inflammatory bowel disease, for example [23].

In addition, some of the identified candidate genes appear to be specifically associated with one or a few immune-related diseases. Disease-specific genes could provide insight into specific aspects of the disease pathogenesis. For example, LPP (LIM domain-containing preferred translocation partner in lipoma), which is shared with vitiligo, is involved in cell adhesion and may have a structural role in the intestine [50].

The combined analysis of immune-related diseases in the future may add power to studies to discover more common genetic variants with smaller effect sizes and may contribute to insights into their shared etiological pathways.

Towards Clinical Applications

The ultimate aim of discovering causal genes and pathways involved in CD is to improve the accuracy of diagnosis and to contribute to risk stratification for determining the follow-up and treatment needed.

Thus far, HLA-DQ genotype is the strongest genetic factor linked to CD. HLA-genotyping with a tag SNP method, using six HLA-tagging SNPs, predicts HLA-DQ risk type with high accuracy and is a cost-effective method suited for large-scale use [51, 52].

HLA-DQ2 and HLA-DQ8 combined have a sensitivity of median 96 % [53]. Individuals who have neither HLA-DQ2 nor HLA-DQ8 are unlikely to have CD. HLA-DQ genotyping could therefore be used to exclude CD or make it unlikely in patients with an uncertain diagnosis. In addition, genotyping could be used as a first-line test in the screening of asymptomatic individuals with an increased risk for CD, for example, patients with type 1 diabetes or autoimmune thyroiditis, and the first-degree relatives of CD patients [53, 54]. For example, in siblings of children with CD, who have an overall risk for CD of 10 %, HLA-DQ genotyping was used to stratify ~40 % of the siblings into a group with a small residual risk of <1 % and another ~30 % into a group with a residual risk of 1–10 % [55].

In contrast, the specificity of HLA-DQ genotyping is rather low, with a combined median specificity for HLA-DQ2 and HLA-DQ8 of 54 % [53]. The presence of HLA-DQ2 or HLA-DQ8, in combination with the presence of CD-specific antibodies, could strengthen the diagnosis in patients in whom CD is clinically strongly suspected but in whom no intestinal biopsy will be performed [53]. However, even at a relatively high a priori chance for CD, the proportion of individuals with false-positive results is rather high. Thus, because of the low positive predictive value of HLA-DQ genotyping, a combination with additional risk factors may lead to better prediction of CD risk. A two-step approach could be applied: first excluding individuals without HLA-DQ2 and HLA-DQ8, and second classifying the remaining individuals into different risk groups based on their non-HLA genetic risk factors [16].

In a genetic risk model based on HLA-DQ genotype and ten non-HLA susceptibility SNPs, the presence of 13 or more risk alleles in an individual implied an odds ratio of 6.2 (95 % confidence interval 4.1–9.3) for CD, compared to individuals with 5 or fewer risk alleles [56]. An intermediate HLA-genotype risk combined with 13 non-HLA risk alleles led to an increased risk for CD (odds ratio 6.1) comparable to a high HLA-genotype risk combined with no non-HLA risk alleles (odds ratio 6.2) [56].

The combination of HLA-DQ genotype and the 57 known non-HLA susceptibility SNPs in a genetic risk score leads to a further increase in accuracy of the genetic risk score, with 11 % of the individuals being reclassified to a more accurate risk group. This combination shows a moderate discriminative accuracy with an area under the receiver operating characteristic (ROC) curve of 0.854, corresponding with a chance of 85 % to classify a random CD patient correctly as having a higher risk for the disease than a random individual without CD (J. Romanos et al. unpublished data). However, there is still a large overlap in genetic risk scores between CD patients and healthy individuals, as shown by the percentage of patients and healthy controls classified at intermediate risk: 51 % of the patients and 40 % of the healthy controls. The high-risk category, consisting of the top 25 % of genetic risk scores, has a sensitivity of 43 % and a specificity of 93 %. The positive predictive value for the high-risk category is estimated to be 6 % at a CD prevalence of 1 % and 43 % for a high-risk population with an a priori risk of 10 %. In both situations, the negative predictive value is expected to be higher than 99 %. With a prospective cohort study, the positive and negative predictive values of a genetic prediction model can be estimated more accurately. For example, the PreventCD Study encompasses a European multicenter study among high-risk CD families, in which approximately 1000 newborns who tested positive for HLA-DQ2, and/or HLA-DQ8 will be genotyped in more detail [57].

New insights into genetic risk factors, including their interaction with environmental factors, may contribute to further refining of the prediction models. Genetic data will probably need to be combined with other biomarkers in order to identify subgroups that can usefully guide follow-up and treatment [58].

An important aspect of genetic studies remains the discovery of causal genetic variants and new pathways, including the pathways shared with other immune-related diseases. These may eventually contribute to the identification of new therapeutic targets.