Introduction

Crohn’s disease (CD), one of the two major forms of inflammatory bowel disease (IBD), is characterised by abdominal pain, diarrhoea and weight loss due to chronically relapsing inflammation of the intestine. This multifactorial disease results from the combined action of genes and environment. The first identified susceptibility gene for CD was nucleotide-binding oligomerisation domain containing 2 (NOD2) [13]. Another gene discovered to contribute to CD susceptibility was discs large homolog 5 (DLG5) [4], albeit association did not replicate unequivocally [5]. Recently, several genome-wide association studies analysed about 300,000 single-nucleotide polymorphisms (SNPs) in CD patients and controls [610]. Variants in DLG5 were not reported among the top-ranked SNPs but were reported to be significantly associated in at least one study [8]. Moreover, the primary association was gender- and age-specific [11, 12], which could explain divergent association findings and which was recently established by a large meta-analysis [13]. Furthermore, recent studies indicate that DLG5 might play a more important role in paediatric CD [12, 14]. Interestingly, the two susceptibility genes might contribute in a joint fashion to the overall disease risk: In two CD association studies [4, 15], a significant accumulation of the DLG5 R30Q variant was observed in CD patients carrying one of the three major NOD2 risk variants. This interaction could not be confirmed by all studies [16, 17]. However, this is in accordance with the complex nature of polygenic diseases which are also influenced by factors such as gender, age and population origin which may vary between study samples. Thus, the statistical interaction observed in some studies may be a consequence of a physiological interaction, e.g. implication in the same physiologic pathway. While the function of NOD2 is well understood, the function of DLG5 and its implication in CD remain enigmatic. Thus, further investigation of DLG5 and potential common features with NOD2 could aid to identify pathways dysregulated in CD and thus provide novel clues into the pathogenesis of this complex disease.

DLG5 is a member of the membrane-associated guanylate kinases (MAGUKs), which constitute a large family of scaffolding proteins. In general, MAGUKs are involved in coordinating cellular adhesion and signal transduction at sites of cell–cell contact [18]. Among the few aspects known about DLG5 is its localisation to adherens junctions at sites of cell–cell contact where it interacts with β-catenin and vinexin [19]. Thus, DLG5 is part of the connecting link between the membrane spanning proteins of the junction site and the actin cytoskeleton.

In 1998, Nakamura and colleagues identified DLG5 as a novel human homolog of the Drosophila gene lethal(l)-discs large (dlg) by EST database screening [20]. Mutant flies lacking l-dlg show disruption of cell–cell adhesion and a tumour-like overgrowth of the larval imaginal discs epithelium. Nakamura and colleagues termed the new human DLG protein P-dlg (DLG5) because it is predominantly expressed in the placenta and various gland tissues, such as the prostate. Previously, four other human MAGUKs were identified as homologs of Drosophila dlg, namely DLG1 [synapse associated proteins (SAP)-97], DLG2 [postsynaptic density (PSD)-93], DLG3 (SAP-102) and DLG4 (PSD-95), all of which are PSD and SAP with known physiologic function. However, they are predominantly expressed in the brain [18], which is in sharp contrast to the expression pattern of DLG5, implying that the function of DLG5 might differ from those of the DLG1–4 MAGUK proteins.

Therefore, we used sequence comparisons and phylogenetic profiling to analyse the DLG5 gene and its protein. The phylogenetic analyses show that the DLG5 protein is more closely related to caspase recruitment domain (CARD) containing MAGUKs, which activate the NFκB pathway, than to the other DLG proteins. Furthermore, we identified a CARD in DLG5. This provides new insight into the function of DLG5 and indicates that this CD susceptibility gene likely acts in major pathways of host defence like the other members of the CARD family.

Materials and methods

Bioinformatical analyses

Amino acid sequences from MAGUK and CARD proteins were collected from public databases (Uniprot, Ensembl). For multiple alignments, we used CLUSTALW [21] with default settings. Phylogenetic analyses with bootstrapping (100 replicates) were obtained using the maximum likelihood method with the Jones–Taylor–Thornton model and using neighbour joining. Phylogeny computations were done using the PHYLIP package [22]. The presented trees are consensus trees from the bootstrap calculations. Interaction data on CARD proteins were taken from the Human Protein Reference Database [23].

Domain analysis

For domain analysis, we have used the hidden Markov models (HMM) derived from the Pfam database version 19.0 [24] and profiles derived from the ProDom database [25]. Hmmpfam was run with the parameter “-forward” to use the Forward instead of the Viterbi algorithm. Furthermore, we have used context-aided domain annotation [26] to improve the localisation of domains. For the construction of specific HMM models for domains, we have used the HMMER package, version 2.3 [27]. Domain-wise annotation was done using custom scripts available from the supplementary materials page. Coiled-coil prediction was performed using the programmes Coils, version 2.2 [28], and Marcoil [29]. The results from different profiles and coiled-coil analysis were unified; the data files are available on the supplementary materials web page.

Study sample

The Belgian study sample comprises 481 CD patients, all followed at the IBD clinic of the University Hospital Gasthuisberg, Leuven, a tertiary care referral centre. The control population consisted of 299 unrelated healthy controls without familial history of IBD or known immune-mediated disorders. Samples were stored in a coded, anonymised database existing since 1997. Ninety nine percent of patients and controls were of Western European/Caucasian origin. Ethical approval was given by the Ethics Board of the University of Leuven. Informed consent was obtained from all participants. DNA was extracted from whole venous blood by a standard salting-out procedure and stored at 80°C. DLG5 exon 1 was sequenced in 65 patients from this sample for mutation screening, and subsequently, the entire sample was genotyped for identified polymorphisms.

Genotyping

Sequencing of DLG5 exon 1 in CD patients was performed with the BigDye® Terminator cycle sequencing kit (Applied Biosystems, Foster City, CA, USA) to identify nucleotide polymorphisms. Altogether, five primer pairs were used to sequence DLG5 exon 1, as well as upstream and downstream segments in overlapping parts (Supplementary Table 1). Subsequent large-scale genotyping of identified polymorphisms was performed by standard allelic discrimination method using TaqMan® SNP genotyping assays on an ABI 7900 sequence detector (Applied Biosystems).

Genetic analyses

Hardy Weinberg equilibrium in cases and controls was tested using the exact test as implemented in the programme Haploview 3.2 [30] at a significance level of 0.05. Linkage disequilibrium between two markers, as well as inference of haplotypes, was also calculated with Haploview 3.2. Odds ratios were calculated as a measure of association using logistic regression. In the regression models, the outcome variable was the CD vs control status, and predictors were age, gender and each intronic SNP. The regression was done for the recessive (two copies of the SNP vs one or no copy) and additive (one or two copies of the SNP vs no copy) inheritance models. Interaction analyses between DLG5 and NOD2 SNPs or DLG5 SNPs and gender, respectively, were performed by introducing the product of these two variables as additional variable in the regression model. Thereby, NOD2 risk status was defined as being a carrier of at least one variant allele of the three known main polymorphisms R702W, G908R, 3020insC in NOD2.

Functional analysis of CARD exon expression

RNA from placental tissue was obtained as described by Krull et al. [31]. Reverse transcription was performed using SuperScript® II Reverse Transcriptase (Invitrogen, Carlsbad, CA, USA) with random hexamers. Whole-tissue RNA was isolated from colon biopsies using TRIZOL reagent (Life Technologies, Gaithersburg, MD, USA) according to the instructions by the manufacturer and total RNA was transcribed to cDNA. Biopsies from normal controls, as well as from involved areas from patients with active CD, have been used in this study. Analyses of biopsies were approved by the ethical committee of the University of Muenster. Amplification of DLG5 exon 1–5 product from cDNA was performed using gene-specific primers (Supplementary Table 1). PCR products were sequenced using the BigDye® Terminator cycle sequencing kit (Applied Biosystems) to verify amplification of DLG5 exon 1–5.

Results and discussion

DLG5 is evolutionary related with TJP and CARD MAGUKs

Compared to other members of the MAGUK family, the function of DLG5 in the context of physiological pathways is poorly understood. Analysis of its closest evolutionary relatives might help to shed light on its function and role in CD. Therefore, as a first step, we evaluated the position of DLG5 in the phylogenetic tree of the MAGUK family. We collected amino acid sequences of known MAGUK proteins for vertebrate and non-vertebrate species from public databases (Ensemble, Uniprot). As scaffolding proteins, all MAGUKs possess several protein–protein interaction domains. By definition, they all contain one guanylate kinase (GK) domain which is homologous to the yeast GK. Apart from the subgroup of MAGUKs with inverted domain structure (MAGI), the domain arrangement is as follows: one or more PSD-95/Dlg/ZO1 (PDZ) domains, one Src-homology-3 (SH3) domain and one GK domain [18]. We identified the typical MAGUK domains in the amino acid sequences using the Pfam hidden Markov models [24]. Phylogenetic trees were computed based on the singular GK or SH3 domains using the neighbour joining and maximum likelihood tree-building algorithms. We assessed the significance of the phylogenetic branches using bootstrapping. We found compelling evidence that DLG5 does not group with DLG1–4 but localises in a completely different branch of the family tree together with TJP1–3 (tight junction protein 1–3) and CARD10, 11, 14 (CARD family member 10, 11, and 14); see Fig. 1 and Supplementary Figs. 14. Thus, despite its discovery as a human homolog to Drosophila discs large, DLG5 does not belong to the DLG MAGUK subfamily defined by DLG1–4.

Fig. 1
figure 1

Phylogeny (left) and domain arrangements (right) of the MAGUK proteins. The tree is based on the sequences of the SH3 and GK domains which are present in all proteins of that family. The DLG5 protein was initially classified as a disc large homolog protein (DLG) based on the presence of multiple PDZ domains. However, a detailed phylogenetic analysis, as well as a comparison of the domain arrangements, shows that DLG5 is actually more closely related to other proteins with CARD. This is in accordance with our finding on the presence of a CARD domain in DLG5. For clarity, a maximum likelihood tree with 100 bootstraps for a small set of 7 human proteins is shown. However, the same results were also obtained with larger data sets and other phylogenetic methods (see “Materials and methods” and Supplementary Figs. 1–4 for details)

From a functional perspective, the occurrence of TJP1–3 MAGUKs together with DLG5 in the phylogenetic tree is not unexpected, as TJP1–3 are localised at tight junctions and DLG5 is localised at adherens junctions. Both junction complexes jointly constitute a barrier, which regulates paracellular permeability between epithelial cells. This functional relatedness can be based on phylogenetic relatedness of these MAGUKs. In contrast, the evolutionary relatedness of DLG5 and the CARD10, 11, 14 MAGUKs is unexpected. Interestingly, these closest relatives of DLG5, CARD10, 11 and 14, belong to the same protein family as NOD2, thus providing first evidence that the two CD susceptibility genes, NOD2 and DLG5, are closely related and might be partners in an evolutionary related system.

DLG5 contains a CARD

The evolutionary relatedness of the GK and SH3 domains in DLG5 with those in the CARD and TJP MAGUKs prompted us to look for further common features. One such feature might be specialised domains not common to all MAGUKs. The MAGUK TJP1 contains a C-terminal ZU5 domain. CARD10, 11 and 14 harbour a specialised N-terminal CARD. Therefore, we searched for unknown specialised domains in DLG5. DLG5 is a very large gene, and in successive publications, new N-terminal parts were found in DLG5 cDNA [20, 32]. Consequently, we searched the Genbank EST database [33] for DLG5 mRNA sequences, constructed a multiple sequence alignment with ClustalW [21], and picked the longest one for further analyses. The longest cDNA clone (AB011155 in Genbank) with 32 exons deposited in public databases is the one isolated by Nagase and colleagues as part of a project for sequencing entire cDNAs from the brain which correspond to relatively long transcripts [34].

The retrieved mRNA sequence is identical with other DLG5 sequences from public databases apart from an additional N-terminal sequence of 404 nucleotides. A blastn search (www.ncbi.nlm.nih.gov) against the human genome localised this additional sequence 57 kilobases upstream of the exon that was previously considered to contain the ATG start codon (exon 2 in Fig. 2a). In search for other potential exons, we have manually scrutinised the genomic region between the two genes that flank DLG5 (POL3A and KCNMA1). We have considered blastn hits, conserved regions and HMMGENE predictions [35] and did not detect any other additional exons (Supplementary Fig. 5). Therefore, the 404-nucleotide sequence is presumably the true first exon of the DLG5 gene.

Fig. 2
figure 2

CARD coding exon in DLG5. a Exon–intron-structure of the DLG5 gene and domain structure of its protein product: The discovered CARD domain is coded by the novel first exon of the DLG5 gene. The asterisk denotes the position of the R30Q polymorphism. b DNA sequence of the first exon and of the DLG5 gene: The exonic region is as described for cDNA clone AB011155 (NCBI database, www.ncbi.nlm.nih.gov) and is printed in capital letters. SNPs discovered in CD patients are indicated in bold face; palindromic sequence is underlined. Amino acids are given beneath their coding triplets, amino acids corresponding to the predicted CARD are highlighted in grey

Most importantly, a search with hidden Markov models for protein domains from the Pfam database [24] revealed that the amino acid sequence derived from the first exon (defined by the additional 404 nucleotides in the AB011155 mRNA) contains a CARD (E-value = 0.0048, Supplementary Fig. 6) like the CARD10, 11 and 14 MAGUKs. We also identified a homologous CARD coding sequence upstream of the DLG5 gene in multiple other species, such as chimp, rat, mouse and zebrafish. Notably, the DLG5 homolog in zebrafish is more similar to the known CARD sequences from other proteins, as shown by the lower E-value (7.5e-6). Nevertheless, the CARD coding exon of DLG5 shows a high degree of conservation in these species (Supplementary Fig. 7).

To gain functional evidence for this in silico predicted domain, we amplified the new exon 1 together with DLG5 exons 2–5 using cDNA obtained from RNA isolated from human placenta and colon. Our data show that the new exon 1, coding for CARD, is expressed as part of DLG5 in these tissues (data not shown, see Supplementary Table 1 for primers). Thus, we have experimental proof that DLG5 mRNA is expressed in the gut, the organ primarily affected in CD, and harbours an additional first exon coding for a CARD domain. This establishes DLG5 as a member of the CARD protein family, which provides a more conclusive link to NOD2, its interacting partner in CD susceptibility, and further points towards involvement of both genes in a similar pathway relevant for intestinal inflammation.

DLG5 is a coiled-coil CARD protein

The CARD family can be divided into four groups based on their domain arrangement: the nucleotide-binding-domain (NBD) CARDs, the coiled-coil CARDs (CC-CARDs), the bipartite CARDs and the CARD-only proteins [36]. The presence of a CARD motif is the defining feature for this protein family and, thus, contained in all CARD proteins. Members of the first two groups possess a CARD domain, an oligomerisation domain (either the NBD or the coiled-coil domain) and a sensory domain that regulates oligomerisation. Upon sensory activation and subsequent oligomerisation, the NBD- and CC-CARDs recruit the bipartite-CARDs, via CARD–CARD interaction. The latter contain a CARD domain and a functional motif, e.g. a kinase domain, which becomes activated in the recruitment process. In general, the activated bipartite-CARD proteins directly mediate the assembly of components into caspase and NFκB signalling pathways. The CARD-only proteins contain no other domain than CARD and may act as regulators of the multi-domain CARDs. As such, the CARD proteins are implicated in host defence against infection, environmental stress or cellular damage [36].

Because the GK and SH3 domains of DLG5 were evolutionarily closest related to the CC-CARD MAGUKs CARD10, 11 and 14, we chequed whether DLG5 also harbours a coiled-coil motif using the prediction programmes Coils2 [28] and Marcoil [29]. Indeed, a coiled-coil stretch is predicted for DLG5 between the CARD domain and the PDZ-SH3-GK motif (Supplementary Fig. 8). Part of this coiled-coil motif was previously predicted by Wakabayashi and colleagues [19]. Thus, DLG5 is a new member of the CC-CARDs and likely to oligomerise via its coiled-coil domain to initiate signalling via its CARD.

CD susceptibility variant R30Q is at the begin of the coiled-coil domain

Interestingly, the R30Q polymorphism (rs1248696, R140Q when including the CARD domain), which was reported to be associated with CD, is located at the begin of the coiled-coil motif stretched from amino acid position ∼137–605 (Supplementary Fig. 8). Coiled-coil motifs usually contain a repeated seven amino acid residue pattern, the heptad repeats, with hydrophobic residues at positions a and d. In an aqueous environment, two such helices form a complex to shield the hydrophobic amino acids at their interface. The 30Q variant is located at position b of the first heptad, where amino acids are predicted to be part of the coiled-coil motif (Supplementary Fig. 8). Although the coiled-coil probability for the amino acids in the Q-containing heptad are only slightly higher than in the R-containing heptad, this is the first heptad of the coiled-coil motif. It remains conjectural that the correct positioning of the CARD domains in the oligomerisation process may be influenced by the variant, which in turn affects subsequent activation processes. Because DLG5 is located at adherens junctions at sites of cell–cell contact, the disruption or instability of the cellular junctions may trigger an activation and oligomerisation of the DLG5-CARD. These junction complexes form an epithelial seal which prevents bacteria from invading body tissue, rendering DLG5 a custodian of intercellular junctions. This function might be disturbed in CD.

DLG5 in the phylogeny of the CARD family

In light of our finding that DLG5 is a new member of the CARD protein family, we investigated its evolutionary relationship to other family members. An alignment of the CARD domains from several human CARD proteins is shown in Fig. 3. Several amino acid positions are conserved among the family members, also in DLG5, supporting the identification of the CARD domain in the DLG5 proteins. However, the overall variability of the CARD domains from different CARD proteins is very high. This high variability among the CARD proteins is reflected in the rather low bootstrap values in the deep nodes of the corresponding phylogenetic tree. The CARD containing MAGUKs CARD10, 11 and 14 are closely related with CARD9, another CC-CARD protein, and BCL10, with which they all interact (Fig. 3). While DLG5 is a CC-CARD protein from the MAGUK family, it does not belong to the subgroup defined by CARD9, 10, 11 and 14. Instead, the CARD domain of DLG5 is most closely related to the CARD domain of APAF1 (apoptotic protease activating factor 1). Although the bootstrap value for this group is moderate, APAF1 turned out to be a sister group of DLG5 in all phylogenies conducted, using different tree building algorithms as well as nucleotide sequences instead of amino acid sequences (data not shown). APAF1 is known to act as cell stress sensor and becomes activated by cytochrome c that is released from mitochondria. Upon oligomerisation via its NBD domain, it recruits caspase 9 (CASP9) via CARD–CARD interaction. CASP9 becomes in turn activated in this process, which leads to induction of apoptosis [37]. It is uncertain, however, whether this implicates an effector function in caspase regulation for DLG5.

Fig. 3
figure 3

Phylogenetic tree (left), amino acid sequence alignment (right) of CARD domains and interaction profiles (middle) of CARD proteins. On the right, ClustalX alignment of 22 amino acid sequences from different human proteins. The alignments were taken from the PFAM entry for the CARD domain; the DLG5 CARD domain was aligned to the PFAM profile using ClustalW. The degree of conservation is indicated by bars below the alignment, and conserved amino acids are shaded differently. On the left, the deduced phylogenetic tree (neighbour joining method; see also Supplementary Fig. 9) with bootstrap values. In the middle, a profile of the interactions between the proteins (based on interactions found in the Human Protein Reference Database)

Remarkably, the CARD proteins are organised as a closely linked interaction network. Nearly each member of the CARD protein family has several interaction partners within the family, as shown by the interaction profile in Fig. 3. This indicates that the pathways mediated by the CARD family members, namely regulation of NFκB activation leading to transcription of pro-inflammatory genes, and regulation of caspase activation leading to inflammation by proteolytic activation of cytokines or apoptosis, are cross-linked. For example, the CD susceptibility factor NOD2 interacts with RIPK2 [38], which is a prerequisite for NFκB activation. Likewise, NOD2 interacts with NLRC4 [39], a CARD protein reported to be involved in apoptotic signalling and cytokine processing [40]. A similar multifaceted function might be conceivable for DLG5. Functional experiments are required to identify interaction partners of DLG5 in the CARD family, which in turn will enlighten the pathways triggered by DLG5 in the CARD protein network (Fig. 4).

Fig. 4
figure 4

Signalling in the CARD family: the Crohn’s connection. Schematic domain arrangements of CARD proteins are given. Asterisks indicate the positions of CD associated variants in NOD2 and DLG5. The CD susceptibility gene NOD2 belongs to the family of NBD-CARDs. The sensory domain of NOD2 is a leucine rich repeat (LRR) domain which senses bacterial products. Upon oligomerisation, NOD2 recruits the bipartite-CARD protein RIPK2 which becomes activated and by itself activates downstream components, which result in NFκB dependent gene transcription. The closest evolutionary relatives of the CD susceptibility gene DLG5 in the MAGUK family, CARD10, 11 and 14, belong to the family of CC-CARDs. They have not been implicated in CD so far. Their sensory MAGUK specific PDZ-SH3-GK domain arrangement becomes activated via T- or B-cell receptor stimulation and recruits upon oligomerisation BCL10, a bipartite-CARD that activates further signalling components to initiate NFκB dependent transcription of proinflammatory genes. The CARD of DLG5 is most closely related to the CARD of APAF1, a CARD protein that initiates apoptosis. Identification of interaction partners of DLG5 in the CARD family and subsequently activated CARD dependent pathways might help to clarify the function of DLG5 and its involvement in CD aetiology

No polymorphisms in DLG5 CARD domain of CD patients

The CARD domain is an obvious target for the search of genetic variants that confer risk to CD due to its involvement in apoptosis and immunity. The underlying genomic sequence in the DLG5 gene has not yet been investigated in this context. Therefore, we sequenced the CARD containing first exon of DLG5 and the adjacent region (630 nucleotides upstream and 340 nucleotides downstream exon 1) in 65 CD patients from Belgium. There were no polymorphisms in the coding region. However, we detected two polymorphisms which are located in the first intron near exon 1 and reside within a palindromic sequence (Fig. 2b, Table 1). Both of these variants were previously described (rs4595491 and rs34954112). A study sample of 481 CD patients and 299 healthy controls from Belgium was genotyped for both intronic SNPs. Neither SNP was associated with CD, even when taking the NOD2 risk alleles into account, and there was no gender-dependent association for either SNP as previously described for the R30Q polymorphism [11, 12]. Linkage disequilibrium analysis of the R30Q polymorphism together with the two intronic SNPs in our study sample, as well as examination of the whole underlying genomic region using HapMap data, revealed that the CARD containing exon of DLG5 belongs to the haplotype block that harbours the complete DLG5 gene (Fig. 5). The intronic SNPs, however, are not located on the risk-associated haplotype harbouring the 30Q variant in our study sample but on the non-risk conferring frequent, common haplotypes, which explains the lack of association of these variants with CD.

Table 1 Allele frequencies of polymorphisms in genomic region of new DLG5 exon 1 in CD patients and controls
Fig. 5
figure 5

Linkage disequilibrium and haplotypes in 5′ region of DLG5 gene. a Pairwise linkage disequilibrium (left) between the R30Q polymorphism (rs1248696) in exon 3 of DLG5 gene and the two SNPs in the intron of exon 1 described in this paper (rs4595491 and rs34954112, Table 1) in the Belgium CD-control sample. D′ values for pairwise LD are represented by grey boxes (black = D′ > 0.8). Haplotypes (right) inferred from the three SNPs were computed with the EM algorithm as implemented in Haploview 3.2. Haplotype frequencies are given next to the haplotypes. b Linkage disequilibrium of the entire genomic region harbouring DLG5 obtained from HapMap data (www.hapmap.org, CEPH trios). The position of the three SNPs examined in our study sample is indicated

Conclusions

In summary, we have analysed the DLG5 protein and found that it harbours a CARD domain. This renders DLG5 a scaffolding protein that signals from adherens junction complexes at cell–cell contact sites into the tightly linked interaction network of the CARD protein family, which might help to explain its role in CD aetiology. The members of the CARD protein family are known to mediate regulation of NFκB activation and regulation of caspase activation in the context of apoptosis or inflammation with significant crosstalk between different pathways. Apart from DLG5 and NOD2, other genes in the CARD family, e.g. CARD8 [41] and NOD1 [42], were shown to be associated with CD. Thus, our finding that DLG5 is a further CD susceptibility gene of the CARD family corroborates that the CARD-mediated mechanisms of host defence are a pivotal mechanism in CD aetiology. We could not identify any genetic variants in the CARD domain of DLG5 in CD patients or healthy controls. This finding strengthens the position of the R30Q polymorphism at the begin of the coiled-coil motif as the polymorphism in DLG5 that carries risk relevant for CD. We have also clarified the evolutionary origin of DLG5, showing that, in fact, this protein is not closely related with other DLG proteins. Therefore, the name “DLG5” is misleading for this adherens junction CARD protein and we propose renaming it.