Introduction

Crohn’s disease (CD) is a chronic inflammatory bowel disease. Patients with long-standing CD patients may relapse and develop other GI complications, such as fistula, stenosis, or perforation, necessitating surgical resection [1]. A patient’s efficiency of life is decreased by repeated surgeries or hospitalizations. Because CD is presently incurable, effective maintenance therapies can importantly enhance long-term prognosis after remission induction.

Anti-tumor necrosis factor (anti-TNF) therapies have significantly changed the CD therapeutic strategy because of their high effectiveness for induction and maintenance therapy [2]. Infliximab (IFX), the first anti-TNF therapy approved for the treatment of CD patients, is a chimeric monoclonal antibody that has revealed promise in inducing and preserving remission in IBD and other immune-mediated disorders (mainly psoriasis, rheumatoid arthritis, and ankylosing spondylitis) [3]. Despite the introduction of newer biologics (adalimumab, vedolizumab, ustekinumab and risankizumab), IFX remains widely prescribed due to its extensive safety profile and long history of use in clinical practice [4].

However, 5%-13% of IBD patients on anti-TNF treatment experience need for discontinuation of anti-TNF due to a loss of response (LOR) in the first year of therapy, which could cause complications leading to surgeries [5]. The causes of LOR are largely unknown; however, the immunogenicity of IFX, which induces anti‐drug antibody production in some patients, is a plausible explanation [6, 7]. Recognizing patients at an increased risk of immunogenicity and early discontinuation of IFX will greatly affect the choice of therapy for CD patients. Thus, several studies have aimed to identify prognostic factors for IFX treatment failure risk including clinical, biochemical, and genomic variations [8,9,10,11]. Among these potential factors, the HLA-DQA1*05 allele (rs2097432) has recently been recognized as a risk factor for immunogenicity and early discontinuation of IFX and is being tested for clinical employment in Caucasian CD patients [12, 13]. On the other hand, the allele frequency of rs2097432 differs substantially between Japanese and Caucasians (approximately 10% vs. 30%), and the association of rs2097432 with IFX treatment has not been validated in the Asian population, including the Japanese. Further clinical validations are critical to broaden the implementation of rs2097432 for clinical practice, particularly in CD patients of non-Caucasian descendants.

In addition, unbiased genome-wide association research (GWAS) has revealed relationships between genetic backgrounds and clinical response to several drugs, such as rs2097432 for the long-term effect of IFX. Recent pharmacogenomics research in Asian patients with IBD showed a link between a genetic polymorphism of the Nudix hydrolase 15 (NUDT15) and the severe adverse effect of thiopurines [14]. Before these findings, NUDT15’s role in thiopurine metabolism was unknown, suggesting the usefulness of the unbiased approaches in this field. Performing GWAS to identify novel predictors for IFX failure in Japanese patients, which has never been found before, is also an urgent issue.

This study primarily aimed to explore the association between rs2097432 and cumulative IFX discontinuation-free rates in Japanese patients with CD. Furthermore, we conducted an unbiased genome-wide survival analysis to identify the novel genetic factors associated with cumulative IFX discontinuation-free rates.

Materials and methods

Study design

This was a single-center retrospective, observational cohort study. The study protocol was reviewed and approved by the Tohoku University Hospital Ethics Committee (2020-1-608). All the patients provided written informed consent. The research followed the Japanese Ministry of Health, Labor, and Welfare ethical guidelines for medical and health studies in humans.

Subjects

From August 2002 to December 2020, we enrolled consecutive, self-reported Japanese CD patients who had a history of treatment with IFX (Remicade®, Mitsubishi-Tanabe Pharma, Tokyo, Japan) at Tohoku University Hospital. We excluded patients who did not receive the scheduled maintenance treatment within 8 weeks due to either primary non-response or intolerance to the agents. Primary non-response was defined as a case where IFX was stopped in the induction phase (first three administrations) due to a lack of or unsatisfactory agent response. Intolerance was defined as a case where IFX was stopped because of an adverse event.

CD was diagnosed based on endoscopic, radiological, and/or histological findings, in patients who presented with specific features as proposed by the Japanese Ministry of Health, Labor, and Welfare, such as longitudinal ulcer, a cobblestone appearance, and noncaseous epithelioid cell granuloma.

This study included 189 patients with CD. All the patients had biologics treatment (IFX, adalimumab, vedolizumab, ustekinumab and risankizumab) for the first time (biologics-naive patients). Clinical data of all enrolled patients were obtained from medical records. The time from the introduction to the discontinuation of IFX because of LOR (IFX persistence) was calculated. Patients were followed from the date of IFX treatment initiation to its discontinuation due to LOR or the end of their follow-up. The reason for IFX discontinuation as specified by the treating physician was recorded. The discontinuation of IFX due to LOR was defined as the withdrawal of IFX due to loss of efficiency as determined by biochemical, clinical, and endoscopic data or the need for abdominal surgery related to CD progression.

Protocol of IFX administration

IFX was administered to patients with moderate to severe CD, who had an active luminal or perianal disease. There was no indication for IFX in patients with CD who had severe stenosis or internal fistulas; these complications were first treated surgically. During the induction phase of IFX treatment, 5 mg/kg of the drug was administered at weeks 0, 2, and 6. Then, every 8 weeks, 5 or 10 mg/kg of IFX was administered as maintenance therapy.

Clinical factors investigated in this analysis

The clinical factors investigated were: gender, age at diagnosis (<21 or ≥21 years), duration of the disease at the start of IFX therapy (<7 or ≥7 years), body mass index (BMI) at the start of IFX therapy (<19 or ≥19), disease location (ileal, ileocolonic, or colonic), disease behavior (inflammation, stenosis, or fistula), presence of perianal disease (perianal fistulas and abscess, anal ulcers and stenosis), history of intestinal resection, smoking at the start of IFX, concomitant elemental diet (<900 or ≥900 kcal/day), concomitant thiopurine use, serum albumin levels at the start of IFX therapy (<3.7 or ≥3.7 g/dl), and C-reactive protein (CRP) levels at the start of IFX therapy (<0.6 or ≥0.6 mg/dl). Continuous variables including age at diagnosis, duration of the disease at the start of IFX therapy, BMI at the start of IFX therapy, serum albumin levels at the start of IFX therapy and CRP levels at the start of IFX therapy were all non-normally distributed (P-values of Shapiro–Wilk test <0.05). Therefore, median values for those variables were employed as the cut-off values for the survival analysis.

Genotyping and quality control

Standard phenol-chloroform extraction precipitation was used to isolate peripheral blood leukocyte genomic DNA using the PAX gene DNA Kit (BD Bioscience, Franklin Lakes, NJ, USA) or the NA1000 Automated Nucleic Acid Extraction Machine (Kurabo, Osaka, Japan). The Japonica Array V1 (Thermo Fisher, Tokyo, Japan), a single nucleotide polymorphism (SNP) array designed specifically for Japanese individuals [15], was employed to conduct GWAS genotyping. For genotype calling, the Affymetrix Power Tools (version 2.10.2.2; Thermo Fisher Scientific, Waltham, MA, USA) were used. The quality control criteria, as recommended by Affymetrix, were a sample call rate of >0.97 and a dish quality control of >0.82. The SNPs were categorized by cluster separation using the SNPolisher package (version 1.5.2; Thermo Fisher Scientific, Waltham, MA, USA). A subsequent analysis was conducted on 643,411 SNPs categorized as “recommended.” Identity by descent probabilities (PI_HAT) was estimated using PLINK 1.90 software [16], and cryptic relatives were detected by the maximum unrelated set identification (IMUS) method implemented in PRIMUS (version 1.8.0) using a minimum PI_HAT value of 0.1. As part of quality control, samples of cryptic relatives (PI_HAT > 0.5) and those with genotyping rates <0.97 or call rates <0.97 were excluded from further analysis.

The corresponding SNP and sample quality-controlled genotype data of 643,496 SNPs from 189 cases were employed for further investigation. The CrossMap program [17] was employed to transform the genomic coordinates of this data from hg19 to GRCh38 to match the imputation panel’s genomic coordinates.

Imputation

For quality control before imputation, SNPs with Hardy–Weinberg equilibrium (HWE) P-value < 1E−5 were excluded, and 613,834 SNPs on autosomal chromosomes were included for further analysis. The imputation panel was an in-house constructed haplotype panel comprising the haplotypes of 12,343 individuals from diverse populations including 2493 individuals from the International 1000 Genomes and 9850 individuals from biobanks of National Center Biobank Network. BEAGLE’s comfort-gt program removed variants that did not match alleles in the reference panel. Subsequently, we used default parameters to run an imputation in BEAGLE 5.2. SNPs with call rate <0.97, minor allele frequency (MAF) <0.05, HWE P-value < 1E−6, or information metric (INFO score) <0.5 were excluded. After exclusion, the genotyped or imputed data of 5,700,569 SNPs including rs2097432 from 189 cases were used for the genetic analysis.

Principal component analysis

Outliers in the sample were detected by principal component analysis of linkage disequilibrium (LD)-independent SNPs using the PLINK 1.90 software. The following LD pruning was carried out using PLINK: -indep-pairwise 50 5 0.1. The samples were nearly homogeneous, as shown by the plot of the top two principal components (PC1 and PC2) for each sample (Supplementary Fig. 1).

Statistical analysis

The study design is summarized in Fig. 1. In the univariate analysis of the cumulative discontinuation-free rates of IFX for the clinical factors, the log-rank test was used. A Cox proportional hazards model was conducted in the multivariate analysis of the cumulative discontinuation-free rates for the clinical factors. P-value < 0.05 was considered statistically significant.

Fig. 1: Flowchart of this study design.
figure 1

IFX Infliximab, CD Crohn’s disease, HWE Hardy–Weinberg equilibrium, MAF minor allele frequency, SNPs single-nucleotide polymorphisms.

In the analysis for the genetic factors of IFX persistence, we first looked at the association between the HLA-DQA1*05 (rs2097432) and cumulative discontinuation-free rates of IFX. The cumulative discontinuation-free rate for rs2097432 variant (C allele) carriers was calculated by log-rank test and the Cox proportional hazards model. We added baseline serum albumin levels and disease location as covariates for the Cox proportional hazards model since these factors showed certain correlation with IFX persistence (P-value < 0.1), and the first two principal components generated from genetic data (described in 7. Principal component analysis in method part) as covariates to avoid the confounding effect of population structure. The association of continuous variables including age at diagnosis, BMI at the start of the biologics, disease duration at the start of IFX therapy, serum albumin levels at the baseline and CRP levels at the baseline between two groups in Supplementary Table 1 was examined by a Mann-Whitney U Test. The correlations between the other categorical variables in Supplementary Table 1 were evaluated by a two-sided Fisher exact test.

We then performed unbiased GWAS for IFX persistence with 5,700,568 SNPs (462,475 genotyped SNPs and 5,238,093 imputed SNPs). All SNPs used for GWAS were call rate ≥0.97, MAF ≥ 0.05, HWE P-value ≥ 1E−6, and INFO score ≥0.5. Using the R package gwasurvivr (version 1.12.0, https://github.com/suchestoncampbelllab/gwasurvivr), a Cox proportional hazards model adjusted by baseline serum albumin levels, disease location, and the first two principal components as covariates were carried out in GWAS. The R software (version 4.1.3, http://www.r-project.org/) was used for all the statistical analyses.

In the analysis of rs2097432, P-value < 0.05 were considered as significant. In GWAS, SNPs with P-values < 5E−8 were considered as genome-wide significant and SNPs with P-values < 1E−6 were considered as candidates. Among these candidate SNPs, we used the “clump” procedure in PLINK 1.90 software to summarize candidate variants into independent candidate loci considering the LD information. We used R2 > 0.1 (--clump-r2 0.1) and SNPs within 250 kb from the lead SNP (--clump-kb 250) as the LD parameters. The Locus Zoom application [18] was used to generate regional association plots around tag SNPs.

Pathway analysis

Pathway analysis with MAGMA [19] was performed using P-value and genomic locations of each SNP in GWAS results. MAGMA first computed the gene-level P-values employing the weighted sum of the related statistics for SNP sites in the region (25 kbp upstream and downstream), considering the local LD structures. Thereafter, a pathway-level statistical inference was performed based on a multiple linear–principal components regression model using biologically functional databases, such as Reactome and KEGG. In total, 1373 pathways were investigated and pathways with P-values < 3.64E−5 (0.05/1373) were considered as significant by applying Bonferroni correction to correct for multiple testing. Pathways with P-values < 0.05 were considered as candidates.

Results

Patients’ characteristics and baseline data

Table 1 summarizes the patients’ baseline characteristics. The median age at initial CD diagnosis and disease duration at the start of IFX therapy were 21.0 and 7.5 years, respectively. The recognized disease locations were as follows: 26 ileal, 24 colonic, and 139 ileocolonic. For disease behavior, 63, 87, and 39 patients had inflammation, stenosis, and fistula types, respectively. There were 138 patients (73.0%) with perianal lesions, and 121 patients (64.0%) had a history of intestinal resection. Thirty-one patients (16.4%) were treated with a concomitant thiopurine.

Table 1 Baseline characteristics and associations of cumulative discontinuation-free rates of infliximab therapy in the study.

At 1, 3, and 5 years, the cumulative discontinuation-free rates of IFX therapy were 93.7%, 83.5%, and 79.7%, respectively (Supplementary Fig. 2).

Clinical factors associated with IFX persistence

The univariate and multivariate analysis results assessing the relationship between clinical factors and cumulative discontinuation-free rates of IFX therapy are shown in Table 1. In the univariate analysis, disease location and baseline serum albumin levels <3.7 g/dL were identified as risk factors for early discontinuation. Multivariate analysis revealed baseline serum albumin levels <3.7 g/dL as an independent risk factor for early discontinuation (hazard ratio: HR = 1.98 and P-value = 0.033). Conversely, surgical history, concomitant thiopurine, smoking status, and baseline levels of CRP were not related to early discontinuation. A Kaplan–Meier curve for baseline serum albumin levels is shown in Supplementary Fig. 3.

Genetic factors associated with IFX persistence

HLA-DQA1*05 (rs2097432) was significantly associated with IFX persistence

Next, we analyzed the association between rs2097432 and the cumulative discontinuation-free rate of IFX. In our cohort, a MAF for allele C was 10.6%, and the frequencies of each genotype (T/T, T/C, and C/C) were 150 (79.4%), 38 (20.1%), and 1 (0.5%), respectively. There was no significant difference in the patients’ background between rs2097432 carriers (T/C or C/C) and non-carriers (T/T) (Supplementary Table 1). The rs2097432 carriers had a significantly increased risk of earlier discontinuation of IFX in the log-rank test (P-value = 0.019). A Kaplan–Meier curve for rs2097432 is shown in Fig. 2. Importantly, this association remained significant even after adjusted for baseline serum albumin level, disease location, and concomitant thiopurine treatment (HR = 2.23 and P-value = 0.026) (Table 2).

Fig. 2: Kaplan–Meier curve of discontinuation of infliximab therapy according to the HLA-DQA1*05 (rs2097432) genotype (T/T vs. T/C or CC).
figure 2

Vertical lines indicate censored cases.

Table 2 Association between HLADQA1*05 (rs2097432) genotype (T/T vs T/C or CC) and discontinuation-free rates in multivariate analysis.

GWAS identified novel significant candidate SNPs associated with IFX persistence

An unbiased GWAS was performed to identify novel genetic risk factors for early discontinuation of IFX. The Manhattan plot of GWAS is illustrated in Fig. 3. The genomic inflation factor was 1.08. In GWAS, 14 SNPs were identified as candidate SNPs (P-value < 1E−6) and classified into five loci (Table 3). One locus, tagged by rs73277969 (HR = 6.04 and P-value = 7.93E−9), located upstream of the PPAR-gamma coactivator 1B (PPARGC1B) gene, achieved genome-wide significance. This signal was even significant after the conditional analysis of rs2097432 (HR = 5.61 and P = 2.90E−8). The regional plot and Kaplan–Meier curve of rs73277969 are shown in Fig. 4 and Supplementary Fig. 4, respectively.

Fig. 3: Manhattan plot of 5,700,568 single-nucleotide polymorphisms utilizing genome-wide association analysis for the relapse-free survival time.
figure 3

Single-nucleotide polymorphisms are plotted according to chromosomal location, with −log10(P) calculated using a linear regression test. The solid line displays the significance level at the genome-wide level (P-value = 5E−8). The dashed line indicates the threshold for the nominal significance (P-value = 1E−6).

Table 3 Candidate SNPs and genes associated with infliximab discontinuation-free rates identified by GWAS (P ≤ 1E−6).
Fig. 4: Results of genome-wide association analysis of discontinuation-free survival rates of Infliximab therapy.
figure 4

Locus zoom plots of P-values around the selected top -associated single-nucleotide polymorphism (SNP) from discontinuation-free time survival analysis. The top-associated SNPs, rs73277969, are displayed as purple diamonds, while the other SNPs are displayed as circles. The color represents the degree of linkage disequilibrium (r2) with the lead SNP.

Pathway analysis suggested associations of platelet-derived growth factor (PDGF) signaling and Fc-gamma receptor (FCGR) activation signaling with IFX persistence

To determine the spectrum of pathways involved in the genes linked with IFX persistence in Japanese CD patients, we conducted a pathway-level association analysis using MAGMA. Our pathway-level analysis demonstrated that signaling by PDGF (P-value = 8.56E−5) and FCGR activation signaling (P-value = 5.80E−4) were the top two pathways, although these pathways did not reach significance after Bonferroni correction. The findings of the pathway-level analysis with a P-value < 0.05 are summarized in Supplementary Table 2.

Discussion

This study demonstrated that serum albumin level is associated with IFX persistence in biologics-naive Japanese CD patients. Importantly, we replicated the significant association of HLA-DQA1*05 (rs2097432) with IFX persistence in Japanese CD patients, which is the first report in Asian populations. Furthermore, we conducted an unbiased genome-wide survival analysis for IFX persistence. Based on GWAS results, a genome-wide significant susceptible SNP for IFX persistence was identified, and the association of PDGF signaling and FCGR activation pathways with IFX persistence was suggested.

Low serum albumin levels are one of the most established risk factors for early LOR and discontinuation of IFX therapy. Low serum albumin levels were found to increase the clearance of IFX from the gastrointestinal tract [20,21,22] and, thus, could reduce the concentration of IFX. Lower drug concentrations could generate large amounts of antidrug antibodies during the induction phase by inhibiting immunological tolerance [23]. These could explain why earlier IFX discontinuation may happen when albumin levels are lower at baseline. In our cohort, concomitant thiopurine treatment was not protective against the early discontinuation of IFX, which is not consistent with previous reports [24, 25]. The percentage of CD patients receiving concomitant thiopurine therapy was lower in our cohort (16% in our cohort vs. >50% in previous reports), which may make capturing the protective effect of concomitant thiopurine therapy difficult.

It has been repeatedly reported that carriers of rs2097432 are more likely to lose response and discontinue IFX in the European Ancestry [12, 13]. The authors revealed rs2097432 to be linked with generating antidrug antibodies to IFX, leading to early LOR and discontinuation of IFX [12, 13]. However, evidence of this variant has been scarce in Asian populations, such as Japanese. In general, allele frequencies and phenotype-genotype relationships of genetic variants vary significantly across genetic populations. Considering its high impact on clinical practice, replicating the association of rs2097432 in different populations is very important. Our results support the robust utility of rs2097432 for predicting IFX persistence regardless of population ancestries.

From GWAS, we found that rs73277969, located upstream of PPARGC1B, was significantly associated with IFX persistence, and its effect was independent of rs2097432. PPARGC1B is a master regulator of mitochondrial biogenesis, oxidative metabolism, and antioxidant defense and is broadly expressed in cytosol, nucleus and mitochondria [26, 27]. PPARGC1B attenuates the inflammatory cytokines IL-6 and IL-12, inhibiting macrophage-mediated inflammation [28] . Furthermore, PPARGC1B is highly expressed in the intestinal epithelium of the crypts and villus axis [29], and miR-378a-3p, which is located in PPARGC1B intron 1 modulates IL-33 expression, which can lead to the onset of IBD [30]. These results suggest that rs73277969 could be a sensible candidate SNP to predict IFX persistence in patients with Japanese CD, although the impact of rs73277969 on PPARGC1B is uncertain. Replication studies for this SNP in other external cohorts in different populations will be required.

In addition, we conducted a pathway analysis to integrate the effects of genetic factors on relapse. The findings indicated the relationship of the genetic background of PDGF and FCGR activation signaling with IFX persistence. Both PDGF and FCGR were closely linked to the inflammatory response, such as TNF alpha. PDGF was increased exclusively in active IBD and was associated with its clinical and endoscopic activity [31]. A tissue culture model revealed that pro-inflammatory cytokines, including TNF alpha and IL -1β, activated PDGF signaling in colonic smooth muscle cells, leading to chronic intestinal inflammation and fibrosis [32]. FCGR signaling, which causes apoptosis or antibody-dependent cell-mediated cytotoxicity (ADCC) via membrane-associated forms of TNF and FCGR, may play a more important role in IBD [33]. One of the suggested mechanisms underlying the effects of IFX is ADCC against transmembrane TNF-alpha-expressing cells [34, 35]. Several SNPs of FCGR3A and FCGR2A have been reported to be linked with a response to IFX therapy [34, 36]. Our pathway-level analysis could affirm the importance of these signaling for IFX therapy and support the significance of our GWAS results.

This study had several limitations. First, this was retrospective, single-center research. Second, rather than using pre-determined criteria, the decision to discontinue IFX (and thus fail to respond to IFX) was made clinically rather than systematically (such as clinical indexes, endoscopic findings, or laboratory parameters). Third, the allele information of rs2097432 was obtained by imputation rather than direct genotyping. Finally, we did not conduct any replications to validate our GWAS results. Despite these limitations, our research has certain strengths, such as the long-term follow-up information available for review in our patient cohort. In addition, we affirmed the relationships of rs2097432 with IFX persistence and identified novel susceptible SNPs for IFX persistence by performing genome-wide survival analysis, the first report in Asian populations.

In summary, HLA-DQA1*05 (rs2097432) is a significant predictor of IFX persistence, irrespective of the population. Upstream variants of PPARGC1B are novel candidate predictors for IFX persistence.