Introductions

Primary biliary cholangitis (PBC) was a chronic autoimmune liver disease characterized by T lymphocyte-mediated intrahepatic bile duct injury, causing intrahepatic bile duct epithelial cells that may lead to cholestasis and eventually to cirrhosis and liver failure [1]. PBC was a relatively rare disease with an annual prevalence of 19–402 per million [2]. In China, a 2019 meta-analysis conducted found that PBC had an annual prevalence of 191.18 per million [3]. Ursodeoxycholic acid (UDCA) was one of the primary drugs used for PBC treatment [1]. PBC was a complex disease, caused by multiple microgenes and environmental factors, demonstrating genetic heterogeneity, phenotypic complexity and ethnic differences [4]. The natural history of PBC could be broadly divided into four stages. The first stage was the preclinical phase, characterized by AMA positivity but no significant abnormalities in biochemical markers. The second stage was the asymptomatic phase, where biochemical abnormalities were the main manifestation without noticeable clinical symptoms. The third stage was the symptomatic phase, in which patients experienced clinical symptoms such as fatigue and pruritus. The fourth stage was the decompensated phase, during which patients developed clinical manifestations such as gastrointestinal bleeding, ascites, and hepatic encephalopathy [5]. The presence of fatigue and/or pruritus at the beginning of PBC indicated a specific group of patients that were predominantly female, younger in age, had a more aggressive form of the disease, shown less response to UDCA, and had a higher tendency to progress to cirrhosis and other complications [6]. The presence of antimitochondrial antibodies (AMA), which was the characteristic markers of PBC, could be detected in 90–95% PBC patients [7]. Antinuclear antibodies (ANA) were also found in PBC patients. Notably, specific antibodies targeting sp100 (characterized by ANA pattern of multiple nuclear dots) and gp210 (ANA pattern rim-like membrane) had been observed in approximately 20% of all PBC patients and 40%–50% of AMA-negative PBC patients [8]. In addition, antibodies against SS-A/Ro-52 and centromere demonstrated a high specificity for PBC and could help the diagnosis of AMA-negative PBC [9]. Currently, over 20 susceptibility genes for PBC had been identified through various genome wide association studies (GWAS), including HLA-DRB1*0801, HLA-DQB1, IL12A, IL12RB2, TNFSF15, IRF5, SOCS1, ARID3A, CD80, CTLA-4, SPIB, IRF8, PLC-L2, IKZF3 and PRKCB [10,11,12,13,14,15].

GTF2I was located at chromosome 7q11.23 and encoded a 112 kDa phosphorylated protein–universal transcription factor Iii [16]. The inversions, microdeletions or structural variations of GTF2I had been closely associated with Williams–Beuren Syndrome (WBS) [17]. Our group had demonstrated an association between GTF2I and Sjögren’s syndrome (SS), systemic sclerosis (SSc), systemic lupus erythematosus (SLE) in Chinese Han [18,19,20]. In addition, the association was also found in rheumatoid arthritis (RA), SLE and SSc [12, 21,22,23]. Considering that both PBC and SS belonged to autoimmune epithelitis [24], we aimed to investigate the possible association and mechanism of GTF2I in PBC.

Methods

Patients and healthy controls

A consecutive series of 466 patients with PBC and 694 geographically and ethnically matched healthy controls (HC) were recruited at the Peking Union Medical College Hospital. All subjects were Han Chinese and unrelated. Diagnosis of PBC was established based on the 2009 American Association for the Study of Liver Diseases practice guideline [25, 26]. To establish a diagnosis of PBC, it was necessary for any two of the three criteria mentioned below to be met: positive AMA serologic testing, consistently elevated cholestatic liver biochemistry (specifically increased ALP levels), and/or histological evidence of non-suppurative inflammation in the interlobular bile ducts. The medical records of each patient were reviewed. Patients with other autoimmune diseases including SLE, RA, type 1 diabetes mellitus, autoimmune hepatitis, primary sclerosing cholangitis, overlap syndrome or SS were excluded.

The Ethics Committee of Peking Union Medical College Hospital approved the study protocol (JS-2156).

Genotyping

Genomic DNA was extracted from the blood samples using a Tiangen DNA kit (Tiangen, Beijing, China) and stored at – 80 °C following standard procedures. The single nucleotide polymorphisms (SNPs) were genotyped using a Sequenom MassARRAY System (Sequenom iPLEX assay, San Diego, CA), in accordance with the manufacturer’s instructions.

DNA samples were then amplified by multiplex PCR, and PCR products were used for locus-specific single-base extension reactions. The final products were desalted and transferred to a 384-element SpectroChIP array. Allele detection was performed by matrix-assisted laser desorption ionization time-of-flight mass spectrometry.

The isolation of CD19 + B cells

Peripheral blood mononuclear cells (PBMCs) from 3 PBC patients and 3 HC were prepared according to standard Ficoll-Hypaque procedures (Pharmacia Biotech, Sweden). Then, the CD19 + B cells were sorted by magnetic bead kits according to the instructions (Becton, Dickinson and Company, USA).

Chromatin immunoprecipitation sequencing (ChIP-seq)

ChIP-seq was performed according to the manufacturer’s instructions. Briefly, cells were cross-linked in a final concentration of 1% formaldehyde followed by glycine quenching. Cells were lysed with lysis buffer (0.2% SDS; 10 mM Tris–HCl, pH 8.0; 10 mM EDTA, pH 8.0; proteinase inhibitor cocktail) and sonicated to fragments of approximately 300–500 bp (Bioruptor, Diagenode). Dynabeads Protein A was washed twice with ChIP buffer (10 mM Tris–HCl pH7.5, 140 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 1% Triton X-100, 0.1% SDS, 0.1% Na-deoxycholate, cocktail proteinase inhibitor) and was incubated with antibody at 4 °C for 2–3 h. The fragmented chromatin was transferred to bead-antibody complex tubes and rotated at 4 °C overnight. The beads were washed once with low salt buffer (10 mM Tris–HCl pH7.5, 250 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 1% Triton X-100, 0.1% SDS, 0.1% Na-deoxycholate, cocktail proteinase inhibitor) and twice with high salt buffer (10 mM Tris–HCl pH7.5, 500 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 1% Triton X-100, 0.1% SDS, 0.1% Na-deoxycholate, Cocktail proteinase inhibitor). After crosslink reversal, DNA was treated with the End Repair/dA-Tailing Module (NEB, E7442L) and Ligation Module (NEB, E7445L) following the user manual. The Chip library was amplified for approximately 11 cycles of PCR using the Q5 master mix (NEB, M0492L). The DNA was size-selected, purified and sequenced on an Illumina sequencing platform.

Statistical analysis

Statistical analysis was performed using the PLINK 1.9 software, GraphPad Prism 5 and R 4.2.3. The Hardy–Weinberg equilibrium (HWE) was assessed by chi-squared tests for each SNP. Any SNP was excluded from subsequent analysis if there were significant deviations from the HWE (i.e., p < 0.05) in HC. The Chi-squared test was used to analyze allele and genotype distributions between cases and controls with a significance threshold of 0.05 (two-tailed). The odds ratio (OR) and 95% confidence interval (95% CI) were calculated, and p values were corrected for multiple comparisons by the Bonferroni method (Pc = P*n, n was the number of tested SNPs).

Sequencing raw data quality was assessed using FastQC and MultiQC. FASTQ files were aligned to hg19 reference genome using Burrows-Wheeler Alignment tool (BWA). Mapped bam files were used for peak calling by MACS2. To annotate the peaks to the nearest genes, we utilized PAVIS and ChIPseeker. Furthermore, we carried out enrichment analyses of Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes pathways (KEGG) and genomes using clusterProfiler. Searches for known motifs and de novo motif discovery were performed using MEME-ChIP. The expressions of genes were queried in the Human Cell Atlas (www.humancellatlas.org). Additionally, ChIP-seq results of GTF2I (GSE176987, GSE177691)were searched in ENCODE from hematopoietic cell line (K562) [27].Using the Genomic HyperBrowser, we determined the overlap and hierarchal clustering between our data and ENCODE datasets. Overlap was determined through segment-segment analysis with either 1,000 or 10,000 Monte-Carlo randomizations while maintaining the empiric distribution of segment and inter-segment lengths but randomizing positions. To perform hierarchical clustering analysis in the Genomic HyperBroswer, we obtained pairwise overlap enrichment values for each samples and computed the distance between samples as the inverse of these values. Cooperative TFs were found by the Genomic HyperBroswer with sources of potential TFs for “UCSC tfbs conserved” datasets [28]. Finally, the overview of our study was accomplished by bioRender.com (2023).

Results

Clinical features of the participants

Figure 1 depicted the workflow for our study. Additionally, clinical data for each patients were gathered in this study (Table 1). A total of 466 PBC and 694 HC were recruited for SNPs analysis in this study. The average age of the PBC and HC was 55.70 ± 10.84 years and 49.40 ± 11.00 years, respectively. The gender ratio (female/male) in the PBC group was 9.98, while that for the controls was 10.57.

Fig. 1
figure 1

The overview of our study. a The discovery of PBC susceptibility gene—GTF2I. b The conduction of the ChIP-seq of GTF2I gene in PBC and HC. c The identification of target gene—IL21R

Table 1 Summary of clinical data for PBC subjects and controls

SNP analysis of PBC and HC

Among the 3 SNPs genotyped, SNP rs800879 of NCF1 deviated from the HWE in the controls (p < 0.05) and was excluded from further analysis.

The allele and genotype frequency distributions of the remaining 2 SNPs are presented in Table 2 and Supplementary Table 1. The frequency of the rs117026326 variant T allele was significantly higher in PBC patients than that in HC (20.26% compared with 13.89%, OR 1.56, 95% CI 1.27–2.96, Pc = 1.09E–04, Table 2). Similarly, the genotype distributions of rs117026326 were significantly different between PBC patients and controls (Supplementary Fig. 1, Supplementary Table 1). Logistic regression analyses using genetic additive, dominant, and recessive models yielded similar results, with the strongest association found in rs117026326 in dominant models (OR 1.64, 95% CI 1.28–2.13, Pc = 2.31E–04; Supplementary Table 2). The frequency of the rs73366469 variant T allele was significantly lower in PBC patients than in the controls (77.01% compared with 83.43%, OR 1.49, 95% CI 1.22–1.85, Pc = 3.22E–04, Table 2). Likewise, the genotype distributions of rs73366469 were significantly different between PBC patients and controls (Fig. 1, Supplementary Fig. 1, Supplementary Table 1). Logistic regression analyses using genetic additive, dominant, and recessive models yielded similar results, with the strongest association found in rs73366469 in additive models (OR 1.74, 95% CI 1.24–2.44, Pc = 3.00E-03; Supplementary Table 2).

Table 2 Allele distributions in the subjects
Table 3 Information of the five de novo motifs discovered by MEME-ChIP

ChIP-seq analysis of GTF2I

We observed an elevated proportion of binding sites located upstream and 5’ UTR of genes in PBC patients in comparison with HC (11.3% VS 2.3%, 6.1% VS 0.4%, Supplementary Fig. 2). Additionally, difference peaks annotation revealed that 65.34% of the significant increase sites in PBC compared to HC were located in the upstream and promoter regions (Fig. 2). While, for the significant decrease loci in PBC compared to HC, 73.45% of the peaks were located in the distal intergenic region (Supplementary Fig. 3).

Fig. 2
figure 2

Different peaks of significant increase sites in PBC compared with HC annotated by Chip seeker. There were 65.34% of those binding sites located in the upstream and promoter

In addition, we conducted gene enrichment and KEGG analysis on the genes with the upstream and 5ʹ UTR binding sites, which were significant increase ones in PBC compared to HC. Gene enrichment analysis revealed that DNA-binding transcription factor binding, N-methyltransferase activity, transcription coactivator activity were the top three MF (Molecular Function) pathways for the upstream(Supplementary Fig. 4). As for the 5ʹ UTR, the top three BP(Biological Process) pathways were G1/S transition of mitotic cell cycle, ribonucleoprotein complex biogenesis, regulation of protein stability, while those for CC (Cellular Component) and MF pathways were spliceosomal complex, ubiquitin ligase complex, ubiquitin ligase complex and ubiquitin-like protein transferase activity, ubiquitin-protein transferase activity, ubiquitin protein ligase binding, respectively (Supplementary Fig. 5). KEGG analysis of the upstream identified Epstein-Barr virus infection, lipid and atherosclerosis and regulation of actin cytoskeleton were the top three significant pathways (Supplementary Fig. 6). Similarly, KEGG analysis demonstrated that the ubiquitin mediated proteolysis was the only significant pathway for the 5ʹ UTR (p value 1.62E–04).

Our search for the overlaps of the significant increase sites of PBC in our ChIP-seq with the gene list of autoimmune disease in NHGRI-EBI Catalog of human genome-wide association studies [29] identified several susceptibility genes, including IL21R foe PBC, BLK for SLE, ICAM1, ICOSLG, IFNAR1 for ankylosing spondylitis, psoriasis, ulcerative colitis, Crohn's disease, primary sclerosing cholangitis and MAPK1 for ulcerative colitis, Crohn’s disease.

The distribution of our ChIP-seq peaks of the significant increase sites in PBC was throughout the entire genome. In-depth analysis of IL21R region, we found that GTF2I may bind to the IL21R promoter to regulate the IL21R expression, with four peaks of GTF2I binding sites, including three increased binding sites in upstream and one increased binding site in 5' UTR (Supplementary Fig. 7). By searching the Human Cell Atlas database, we found the expressions of GTF2I and IL21R varied among different cells. GTF2I had higher expression in naïve B cells, memory B cells, plasmacytoid dendritic cells than others. IL21R had higher expression in naïve B cells, non-classic monocyte cells than others (Supplementary Fig. 8).

Motif analysis by MEME-ChIP identified five significant motifs (E value ≤ 0.05) (Fig. 3, Table 3). Comparison with JASPAR2022 CORE non-redundant v2 revealed that the most significant motif (motif id:TTTHTTKTWTTTWTT,E-value 2.20E-90) had 67 matches with known motifs, for instance: CDF5, DOF3.6,SoxC, DOF5.8, FoxM,DOF5.1and so on. Additionally, the one with motif id:AGRRRGAGAAGGARR(E-value 4.70E-050) had 15 matches, including RAMOSA1,BPC1,BPC6,PRDM1,BPC5,Spi1 and Trl.

Fig. 3
figure 3

Five de novo motifs discovered by MEME-ChIP. a motif id: TTTHTTKTWTTTWTT, E-value 2.20E–90, b motif id: AGRRRGAGAAGGARR, E-value 4.70E–50, c motif id: CATCTGTRAAATGGG, E-value 2.10E–43, d motif id: GCCGCCGCCGCCKCC, E-value 5.50E–40, e motif id: TACTCRGGAGGCTGA, E-value 9.00E–08

The genomic HyperBroswer analysis

We found significant overlap between our ChIP-seq peaks and GSE176987, GSE17769 (p value 0.020, 0.020, respectively). We performed hierarchical clustering analysis to assess the similarity between samples within the same cell group (in this case, GSE176987 and GSE17769) and ours. Our analysis revealed that there were closer similarities between samples within the same cell group (GSE176987 and GSE17769) than when comparing samples among different groups (Supplementary Fig. 9). We also drew a Venn diagram to illustrate the overlap genes of the upstream and 5' UTR of our ChIP peaks with those of GSE176987, GSE17769 (Supplementary Fig. 9). Among the three datasets, we identified a total of 854 overlap genes, with four autoimmune disease susceptibility genes: IL21R, BLK, ICAM1 and IFNAR1. There were 256 transcription factors (TFs) targeting our regions of interest, using “UCSC tfbs conserved” as source of TF occurrences. The top ten cooperative TFs were PAX5, ELK1, SP1, AP2, PAX4, NRF2, CETS1P54, EGR1, MAZR and NGFIC.

Discussion

The study represented the initial investigation conducted in Chinese Han to confirm the association between GTF2I and PBC. Furthermore, our analysis of CD19 + B cells from PBC patients via ChIP-seq had revealed that IL21R was the downstream target gene of GTF2I. Notably, it was important to note that our ChIP-seq results were compared to those obtained through the ENCODE project, revealing a significant overlap.

GTF2I was located at a complicated gene region that encompassed multiple CNV and repetitive sequences. Interestingly, the analysis of "the 1000 genome" data revealed that rs117026326 had a very low frequency of C polymorphism among European and African American populations (2%), in comparison to a relatively high rate within East Asian populations (China and Japan) (9%) and the northern Chinese Han (12%). Our previous research had revealed the association of GTF2I and multiple autoimmune disease in Chinese Han, including SS, SSc and SLE [18,19,20]. Similar association had also been established between GTF2I and RA, SLE and SSc in East Asian [12, 21,22,23]. Hence, it appeared that GTF2I maybe a general susceptibility gene for autoimmune diseases in East Asian. Notably, Bruton's tyrosine protein kinase (BTK) could phosphorylate the GTF2I’s tyrosine within the B cell receptor signaling (B cell receptor, BCR) pathway. The phosphorylation of GTF2I was transported into the nucleus to regulate the expression of related genes, including immunoglobulin heavy chains [30].

IL21R was a heterodimeric receptor for IL21, a class I cytokine, consisting of the common γ-chain (CD132) and the IL21 specific α-chain. The receptor was expressed on CD4 + T, B, NK and dendritic cells, with activated B cells presenting the highest levels. The binding of IL21 with its receptor IL21R activated multiple downstream signaling molecules, including STAT1 and STAT3, which had a crucial role in the proliferation and differentiation of T cells, B cells and NK cells [31,32,33]. A significant association was noted between multiple SNPs in the IL21 and IL21R loci with PBC, accompanied with significantly elevated levels of IL21 and IL21R expression within the livers of PBC patients, in contrast to chronic hepatitis B, autoimmune liver hepatitis or healthy controls [9]. Furthermore, the severity of inflammation positively correlated with the numbers of IL21 + and IL21R + cells. The follicular helper T (Tfh) cells with the expression of CXCR5, ICOS, PD-1, Bcl-6 and IL-21, played an important role in regulating germinal center formation or immune responses and B cell activation, contributing to the pathogenesis of PBC [34]. In particular, a study reported that elevations of both serum and intrahepatic IL-21 levels in patients with PBC, which promoted B cell proliferation, STAT3 phosphorylation and AMA production in vitro [35].

In conclusion, GTF2I was identified as the susceptibility gene for PBC in Chinese Han. Subsequent gene function analysis had established that IL21R represented a target gene regulated by GTF2I. Further experiments were required to elucidate the specific pathogenesis of GTF2I and IL21R in PBC.