Introduction

Twin cohorts have long been used to study the heritability of disease. The premise of the classical twin study is that twin pairs share a similar environment and fixed genetic variability, with monozygotic (MZ) twins sharing 100% segregating genes and dizygotic (DZ) sharing 50% [1, 2]. This assertion incurs some oversimplifications, underpinned by genetic recombination during meiosis, non-additive genetic influence, the potential for MZ twins to share a more similar environment and the complex interplay between genes and the environment [3,4,5,6]. Nevertheless, comparisons between concordance rates of MZ and DZ twin pairs still provide useful insight into the relative importance of genetic and environmental variance in familial clustering of disease.

When evaluating heritability of IBD, Tysk et al.’s pioneering twin study suggested Crohn’s disease (CD) to be almost entirely heritable, and ulcerative colitis (UC) to also be largely determined by genetics [7]. However, re-evaluation of the Swedish twin cohorts [8, 9] and twin Cohorts from Germany [10], Denmark [11] and Norway [12] suggest that environmental factors are more important than previously thought, with the lifetime concordance rates of MZ twin pairs never reaching close to 100%. For CD, the pair concordance from the above mentioned twin cohorts ranged from 20 to 50% in MZ twins and 0% to 10% in DZ twins. For UC, pair concordance ranged from 14.3% to 20% in MZ pairs, compared with 0 to 6.1% in DZ.

Twin cohorts also enable the analysis of environmental factors with substantially smaller cohorts than non-twin retrospective studies. This is achieved by factoring intra-pair concordance and zygosity into logistical regression models [13]. Twin cohorts also enable control of genetic variability when evaluating the exposome, including the microbiome, metabolome and epigenome.

The UK IBD Twin Registry was established to facilitate future IBD twin research. This primary aim of this study is to evaluate the pair concordance of CD and UC within a British twin cohort. The secondary aim is assessment of environmental factors which may be associated with onset of IBD.

Materials and Methods

Establishing the Registry

The UK IBD Twin Registry was established in 2014. Twin pairs were recruited from advertising via the Crohn’s and Colitis UK (CCUK) website, social media and newsletters. Additionally, a dormant twin dataset, which itself recruited using a CCUK mailshot, was retraced. Patient cohorts from the Chelsea and Westminster, and Royal Devon and Exeter Hospitals were contacted via a mailshot. Clinic posters, flyers and clinical referral were also utilised.

Inclusion Criteria

The inclusion criteria to join the UK IBD Twin Registry were as follows:

  • Member of a monozygotic (MZ) or dizygotic (DZ) twin pair

  • One or both twins diagnosed with IBD

  • Patient consent for the IBD twin registry to access medical records

  • Patient consent to be contacted about future studies

Data Collection

After joining the registry, each member was asked to complete a questionnaire about their health and environmental exposures. Medical records were requested from the primary care physician and gastroenterologist to validate diagnosis. The questionnaire was designed in concert with the Danish IBD twin registry to facilitate future collaboration [14]. A more detailed overview of data extracted from medical records and questionnaire can be found within the supplementary information.

Confirming IBD Diagnosis

At screening, self-reports of IBD diagnosis were considered sufficient for initial enrolment. Recruitment material specified the terms “Inflammatory Bowel Disease”, “Crohn’s Disease” and “Ulcerative Colitis”. The distinction between IBS and IBD was described in participant information leaflets and on the project website. In order to confirm the diagnosis of IBD, diagnosis was verified by review of medical records held by the primary care physician and the gastroenterologist. There was 100% correlation between participant diagnosis and clinician diagnosis.

Confirming Zygosity

To determine zygosity, twins were asked if they were “identical” or “non-identical”. Self-reports of zygosity were accepted. This method has been validated by the Danish twin registry who in 2003 demonstrated that genotyping confirmed previous self-reports of phenotype [15]. Eighty twin pairs within our cohort also elected to participate in our twin bioresource and as such were genotyped; correlation between self-reported zygosity and genetic zygosity was 96.5% (unpublished) thus further validating accuracy of self-reports.

Environmental Factors

IBD sufferers were asked about environmental factors at the time of diagnosis. With concordant twin pairs, environmental factors at the time of their individual diagnosis were requested. Healthy co-twins were asked about environmental exposure at the time of diagnosis of their (affected) twin. When questioned about childhood infection, gastrointestinal infection prior to diagnosis, parental concern about hygiene, germs and diet, participants were asked to rank their experience as “less than peers”, “equal to peers”, “more than peers” or “not known”.

Smoking history at the time of diagnosis was categorised as “smoker at diagnosis”, “ex-smoker at diagnosis” and “any smoking history at diagnosis”. Participants were asked if they smoked cannabis on average “daily”, “weekly” or as “occasional social use” at the time of IBD onset. However as numbers were small, data were pooled to a single category of “cannabis use” for analysis.

Infant feeding methods were also assessed by participant recall. Exclusive breastfeeding was defined as breastmilk as the sole form of nutrition until weaning.

Statistical Analysis

Pair concordance was calculated and tabulated as both a percentage of twin pairs and as a ratio of concordant to discordant twins.

Environmental factors were analysed with a logistic regression model adjusted for zygosity and concordance. Univariate and multivariable mixed-effects logistic regression analysis was performed invoking the GEE method using PROC GENMOD using the repeated statement for twins with logit link, binomial distribution and stratified by zygosity. When analysing for multivariate analysis, “ex-smoker” and “current smoker” were removed from the model; “any smoking history”, a combination of the other two parameters, remained within the model, to avoid over representing variables when there was overlap between categories.

Univariate analysis was undertaken for IBD (total), CD and UC. Multivariate analysis was undertaken for CD and UC separately, to avoid the polarising effects of smoking from influencing the correction for multiple comparisons.

Participants were only included for analysis if both twins completed the questionnaire, and if IBD diagnosis was validated by medical records. Missing data were noted and distinguished from negatives when analysing data.

Montreal classification of disease phenotype was only accepted from medical records.

Results

Demographics

Two hundred and forty-four participants were recruited into the registry. Of these, we were able to validate diagnosis from all affected participants in ninety-one twin pairs. These ninety-one twin pairs were included in this analysis. The key demographics and clinical characteristics of participants are summarised in Table 1, with information presented for each twin pair within the supplementary material (Appendix 1).

Table 1 Demographics and clinical characteristics

Thirty-seven twin pairs were monozygotic (MZ); fifty-four dizygotic (DZ). Forty-two pairs had one or both twins diagnosed with CD, and forty-nine pairs had one or both twins diagnosed with UC. The median age (IQR) of all participants was 53.9 years (38.8–63.3); 52.1 (34.5–61.7) and 54.8 (40.1–63.7) years for MZ and DZ twin pairs, respectively. 76% MZ and 73% DZ were female; overall 74.2% participants were female.

Median follow-up of twin pairs from age of diagnosis of first twin to age at entry into the registry, was 21.2 years (7.9–31.5).

Clinical Characteristics

One hundred and twelve recruited participants had a diagnosis of IBD. Fifty-seven were diagnosed with CD and fifty-five with UC.

All twins included in analysis had diagnosis validated from gastroenterology clinic letters. Age of diagnosis and Montreal classification are summarised in Table 1, with further clinical characteristics summarised in Appendix 1, with data presented pair by pair.

For participants with CD, the median age of symptom onset was 22 years (18–28.5, with a median age of diagnosis of 26 years (22.5–36). Median follow-up from diagnosis of first twin to entry into the registry was 19.2 years (5.1–31.0).

For participants with UC, the median age of symptom onset was 25 years (22–29), with a median age of diagnosis of 30 years (24–38). Median follow-up from diagnosis to entry into the registry was 23.1 years (15.1–33.2).

Pair Concordance

Table 2 demonstrates pair concordance for twin pairs, expressed as a percentage of total twin pairs and as a ratio of concordant to discordant pairs.

Table 2 Pair concordance for CD and UC

Significantly more MZ twins with CD were concordant for disease in comparison with DZ twins with CD (Chi-sq. 15.5905. P < 0.001). The concordance of MZ twins with UC was numerically greater than DZ twins, although this did not reach statistical significance (Chi-sq. 0.707. P = 0.40).

For all concordant twin pairs, the median time (IQR) between diagnosis of each twin was 4 years (2–15).

When considering concordant twins with CD, median time (IQR) between diagnoses of co-twins was 3 years (2–15). For CD MZ concordant twin pairs, this was 3 years (2–12.5) compared with 13 years (10–16) for CD DZ twin pairs.

When considering concordant twins with UC, median time (IQR) between diagnosis of co-twins was 6 years (1.5–15). For UC MZ concordant twin pairs, this was 4 years (2–8) compared with 15 years (0–17) for UC DZ twin pairs.

Environmental Risk Factors

The univariate analysis of each environmental risk factors for CD and UC are demonstrated in Appendix 2 and 3. The factors analysed, and subsequently included in multivariate analysis were gastrointestinal infection prior to diagnosis, childhood illness, parental concern regarding germs, smoking history, method of delivery, infant feeding method and self-reports of diet. Multivariate analysis of environmental factors predictive of CD and UC onset are outlined in Tables 3 and 4.

Table 3 Multivariable mixed effects logistic regression model showing significant independent predictors of likelihood of CD
Table 4 Multivariable mixed effects logistic regression model showing significant independent predictors of likelihood of UC

Smoking

A positive smoking history was predictive of CD incidence in multivariate analysis (OR 2.66, 95% CI 1.16 to 6.07 P = 0.02). However, smoking was not significantly protective against UC. Smoking cannabis approached significance as an independent risk factor for CD in multivariate analysis (OR 2.59 95% CI 0.89 to 7.55 P = 0.082).

Early Environment

Multivariate analysis showed no statistically significant difference between CD or UC incidence in those delivered by Caesarean section. When subdivided into categories of exclusive breastfeeding until weaning, exclusive formula and combined breastfeeding and formula, there was no significant impact of breastfeeding on future CD or UC onset. However when re-analysed to compare any breastfeeding (including combined feeding) with no breastfeeding, any breastfeeding was protective against UC onset in multivariate analysis (OR 0.48, 95% CI 0.25–0.93, P = 0.03), but did not significantly correlate with CD onset.

Health Prior to Diagnosis

In multivariate analysis, recall of childhood illness in comparison with peers did not correlate with future onset of either CD or UC. Recall of parental concern regarding hygiene also did not impact IBD onset. However, suffering from gastrointestinal infection in childhood less than peers was protective against UC (OR 0.33 95% CI 0.15 to 0.74, P = 0.007) but not CD.

Diet

Self-classification of diet prior to diagnosis of self or IBD co-twin did not correlate with future IBD onset. Neither did self-reports of consumption of ready-made meals prior to diagnosis.

Discussion

Our study demonstrates significantly greater concordance for CD between MZ twin pairs in comparison with DZ twins. This is consistent with other twin cohorts and infers heritability of CD [9, 11, 12, 14, 16]. Our data also showed a trend towards higher MZ pair concordance in UC, although this does not reach statistical significance. When contrasting our dataset to others, all of our pair concordance rates were consistently at the top end of those described. Unfortunately, in contrast with other European countries, the UK does not have a complete twin birth registry from which to search for individuals diagnosed with IBD. As such recruitment relied upon advertising methods that concordant twins may be more likely to encounter. Hence, the methods by which the registry were formed may have resulted in recruitment bias, which may contribute to both high concordance rates and female predominance.

Our UC DZ concordance rates are four-fold that typical of non-twin siblings [12, 16,17,18]. The trend for UC diagnosis to be more commonly shared between DZ twins than non-twin siblings has been recognised previously [12]. Hence the aetiology of UC may in part be due to an exposomal factor more commonly shared between twins than non-twin siblings.

Our study confirms the recognised association between smoking and CD [19, 20], with an approximately a threefold relative risk. It is possible that the recognised protective effect of smoking in UC [20, 21] was not observed because the smoking status within multivariate analysis included ex-smokers. Cannabis use approached significance as an independent risk factor for CD. It is known that IBD sufferers may use cannabis for symptomatic relief [22, 23], and it is not known whether participants may have been more inclined to smoke cannabis if already experiencing early IBD symptoms.

Within our dataset, there was no correlation between method of delivery and subsequent IBD concordance. We found breastfeeding to be protective against UC but to have no bearing on future CD onset. Previous research into the role of breastfeeding has supported a role in prevention against paediatric onset IBD [24]. Evidence for a role in adult onset IBD is conflictive; early meta-analysis suggested it was protective [25], but subsequent inclusion of a well conducted case control study significantly reduced the benefits on re-analysis [26]. Subsequently, there have been more conflicting studies, with the most protective effect seen within an Asian cohort, where risks of enteric infection amongst bottle fed children may be higher [27]. The description of infant feeding methods also differed in 13.2% of our twin pairs, thus further highlighting the challenges associated with recall studies.

Multivariate analysis showed that experiencing gastroenteritis less than one’s peers was protective against UC and but did not influence CD onset. Recent analysis of the Millennium study cohort also supports an association between infective gastroenteritis and subsequent UC [28]. However, it is difficult to extrapolate whether infective gastroenteritis is truly a risk factor, or whether recall in fact reflects UC prodrome prior to formal diagnosis.

Our cohort has an older median age than most, and a median duration of follow-up of 21.2 years (7.9–31.5); as such it has the strength of likely capturing a greater representation of lifelong concordance than a younger cohort. However, this duration of follow-up is also a significant limitation to the study of premorbid environmental factors, as with such a long follow-up comes significant potential for recall bias.

The lack of a population twin registry also necessitated calculation of pair as opposed to proband concordance. The use of pair concordance precludes calculation of a numerical value of heritability using Falconer’s method [1, 2], which in turn renders comparisons with other twin cohorts less robust.

To conclude, analysis of the UK IBD Twin registry supports heritability of IBD. Rates of DZ concordance of UC are higher than would be expected in non-twin siblings, consistent with other twin studies. The environmental factor most implicated in CD onset is smoking. In UC, breastfeeding was protective, and recall of frequent gastroenteritis was associated with future disease. The establishment of the UK IBD Twin registry also provides a valuable research resource, with collection of samples for a bioresource undertaken for the majority of participants.