Introduction

Behcet’s disease (BD) is a multi-systemic vasculitis characterised by oral aphthous ulcers, genital ulcers and ocular lesions, and it also affects the arteries, veins, joints, gastrointestinal tract and nerves, leading to decreased quality of life or even death [1]. The prevalence and clinical manifestations of BD vary by both region and gender, with an approximate incidence rate of 14/100,000 in China. Grievous microvascular vessel together with neural system involvement is more common in male patients.

The potential pathogenesis is traceable in recent studies. Tissue damage in BD is heavily reliant on T cell imbalance including Th1/Th17 cell expansion and depressed Treg regulation, leading to cytokine activation and lymphocyte recruitment (predominantly NK cells and monocytes) as well as neutrophil hyperfunction [2, 3]. Previous tuberculosis infection as an independent risk factor for illness results in an infectious aetiology in BD [4]. Recurrent exposure to heat shock proteins synthesised by mycobacteria or other microorganisms could be responsible for stronger lymphoproliferative responses and cross-reactivity, and this may also result in increased expression of vascular endothelial factor, which induces endothelial destruction, angiogenesis, thrombophlebitis and vasculitis [5].

Genetic predisposition is a key force in the onset of BD. In particular, HLA-B51 is the strongest susceptibility locus with a carrier frequency of 55–63% [6]. HLA alleles containing HLA-B51, HLA-A26 and HLA-C0704 were proven to be related to BD uveitis in a genome-wide association study (GWAS) [7]. Concerning non-HLA regions, the ERAP1 rs17482078 polymorphism can affect peptide binding, making it a preferential risk factor for HLA-B51–positive patients [8]. The low-frequency missense mutations IL-23R p.Gly149Arg in a Japanese cohort and IL-23R p.Arg381Gln in Turkey were identified as protective factors for BD [9]. The IL-10 rs1800872 allele is linked to decreased IL-10 production in BD-prone individuals [10]. Activated by IL23/IL12, STAT4 plays a vital role in the differentiation of T cells from the naïve phenotype to Th1/Th17 phenotypes. The STAT4 rs897200 risk allele (homozygote AA) is associated with higher STAT4 expression, which enhances IL-17 transcription and expression, resulting in increased clinical severity in patients with BD [11]. A Turkey GWAS reinforced the role of FUT2 variants (rs281377, rs602662, rs492602, rs681343, rs601338, rs632111) in BD susceptibility. FUT2 encodes an α-(1,2) fucosyltransferase that modulates H-antigen secretion in intestinal mucosa [12], implying the involvement of a latent glycosylation pathway in BD episodes.

Nevertheless, few studies concentrated on gene signatures for BD diagnostics and deeper insights into correlations between these markers and immunological pathways should be obtained. In the present study, we aim to elucidate the value of genes as differential biomarkers for identifying the clinical phenotypes of BD and assess the participation of these genes in biological pathways, especially angiogenesis and glycosylation pathways, as well as address distinctive lymphocyte infiltration signatures utilising integrated bioinformatic methods.

Materials and methods

Dataset acquisition and normalisation

Three gene datasets were retrieved from ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) and Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) databases using the key word ‘Behcet’s Syndrome’ or ‘Behcet’s Disease’, and the accession numbers were E-MTAB-2713, GSE17114, and GSE61933.

For dataset E-MTAB-2713, the oligo R package was employed to pre-process the raw data from the website measured at A-AFFY-168 - Affymetrix GeneChip Human Gene 1.1 ST Array [HuGene-1_1-st-v1] [13], which comprises the following in silico transcriptomic profiles: (i) 6 patients with BD, 68 healthy controls (HCs), 58 patients with systemic lupus erythematosus (SLE), 78 patients with inflammatory bowel disease (IBD) and 53 patients with ANCA-associated vasculitis (AAV) as disease controls (DCs) using CD4+ T cells; (ii) 13 patients with BD, 78 HCs, 59 patients with SLE, 87 patients with IBD and 72 patients with AAV using CD14+ monocytes; and (iii) 13 patients with BD, 85 HCs, 43 patients with SLE, 86 patients with IBD and 60 patients with AAV using CD16+ neutrophils. Subsequently, a robust multi-array average algorithm was invoked for background correction, normalisation and summarisation [14]. The R package ‘ArrayQualityMetrics’ was used for the quality control process [15], and ‘oligo’ was applied again for filtering probes with P < 0.05 in at least three samples using the paCalls function (P refers to the probability that the expression amounts of probes is the same as that of background. A smaller value indicates a more significant difference between the probe and background exists, i.e., a greater possibility of probe expression). All the BD patients, HCs and DCs (SLE/IBD/AAV) from dataset E-MTAB-2713 were enrolled to perform differential expression gene analysis, function annotations and build weighted correlation network as well as diagnostic machine learning models.

For GSE17114 using the [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array (GPL570), mRNA was isolated from the peripheral blood mononuclear cells of 14 HCs and 15 patients with BD, and patients with BD were divided into isolated mucocutaneous manifestations (MB), ocular involvement (OB) and large vein thrombosis (VB) subtypes according to the major clinical manifestations described by Oğuz et al. and presented in previous articles [16, 17]. For GSE61399 based on the GPL570 platform, gene expression profiles were extracted using CD14+ monocytes from nine HCs and eight patients with BD.

Differentially expressed gene (DEG) and enrichment analysis

In the E-MTAB-2713 dataset, we conducted DEG analysis using the R ‘limma’ package with a threshold of adjusted P < 0.05 and |log fold change (FC)| > 0.5 [18]. Meanwhile, we also performed Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis to excavate biological functions and signalling pathways in which DEGs are significantly involved with cut-off criteria of P < 0.05 and FDR < 0.05, the visualisation of which was realised using ‘clusterProfiler’ and ‘enrichplot’ [19]. Thereafter, DEGs involved in angiogenesis and glycosylation pathways were exported from GO/KEGG databases and published citations [20, 21] and then validated for BD phenotype classification using the GSE17114 dataset.

Identification of hub genes and construction of a protein–protein interaction (PPI) network

To identify clinical traits-specific module genes (MGs) and establish a co-expression sub-network in CD14+ monocytes and CD16+ neutrophils in the E-MTAB-2713 dataset, we employed the R package ‘WGCNA’ with the optimal softPower to gain a better value of scale-free R2, mean connectivity and a befitting minimum number of gene modules in dynamic tree cut sections [22], from which we picked the most relevant gene modules for patients with BD according to the gene significance and module membership calculated in the weighted correlation network analysis (WGCNA) process. Simultaneously, we implemented a PPI network consisting of the MGs using the STRING database and a threshold score of 0.150 [23]. Resorting to the MCODE plug-in in Cytoscape software with cut-offs of degree = 2, node score = 0.2, k-core = 2 and max. depth = 100 [24], we visualised sub-networks and performed GO and KEGG analyses to recognise immunological pathways in subnet genes.

Random forest (RF) and neural network (NN) classification model establishment

DEGs from the E-MTAB-2713 dataset were input into an RF classifier to construct a RF model using the ‘randomForest’ package in R [25], 500 as the number of decision trees originally. After calculating the error rate of the decision trees, we adjusted the optimal tree number by virtue of the minimum error rate and excellent stability. We obtained the dimensional importance value for gene variables via the MeanDecreaseGini method and genes with importance value ≥ 1 were identified as paramount genes (PGs) of BD and then included for further model construction and validation.

E-MTAB-2713 was selected as the training set for artificial NN model establishment. Min-max normalisation based on the median expression of PGs from the RF classifier was performed in advance. Afterwards, we created an NN model using ‘neuralnet’ and the normalised data as the input layer [26], and the number of neuron nodes in hidden layers was manually selected in accordance to the principle that two thirds of neuron numbers in input layers plus 1.5-fold of that in the output layers [27].

Diagnostic prediction and validation of classification model

Two independent GEO datasets (GSE17114 and GSE61933) were recruited for verifying the aforementioned NN model. The R ‘pROC’ algorithm was employed to compute area under the curve (AUC) classification performance and efficiency [28]. Additionally, PGs were applied for discerning BD phenotypes in GSE17114.

Immunocyte infiltration evaluation using single sample gene set enrichment analysis (ssGSEA) algorithm

We invoked ssGSEA to assess the infiltration of 28 immune cell types in BD samples from the E-MTAB-2713 database [29, 30]. Using the ‘GSVA’, ‘limma’ and ‘GSEABase’ packages, we sought distinctive immune cell profiles for both patients with BD and HCs, and diverse immunological infiltration patterns among patients with BD, DCs and HCs were presented via ‘vioplot’ in R studio.

Results

Screening for DEGs and GO/KEGG Annotation illuminate the involvement of angiogenesis and glycosylation pathways in the pathogenesis of BD

The flow chart illustrated in Figure S1 delineates the procedure for exploring distinct biological pathways, immune cell infiltration and diagnostic gene markers. In CD4+ T cells, 0, 51, 146 and 151 genes were differentially expressed in BD compared to the AAV, HC, IBD and SLE groups, respectively. We also investigated the pathogenesis of BD by applying functional and pathway analyses. ‘Positive regulation of cytokine production’, ‘response to virus’ in biological process (BP) and ‘carbohydrate binding’ and ‘sialic acid binding’ in molecular function revealed an immunopathogenic background and glycosylation involvement for DEGs (Fig. 1A). Regarding CD14+ monocytes, 31, 5, 22 and 66 DEGs were identified in BD relative to the AAV, HC, IBD and SLE groups, respectively. Similarly, ‘defence response to virus’ in BP was linked to an infection-associated pathogenesis of BD (Fig. 1B). Regarding CD16+ neutrophils, 89 genes differentiated patients with BD from HCs, whereas 34, 75 and 247 DEGs differentiated BD from AAV, IBD and SLE, respectively (Table S1). Angiogenesis regulation was identically discovered on the basis of ‘negative regulation of cell migration’ and ‘negative regulation of blood vessel endothelial cell migration’ enrichment (Fig. 1C). Altogether, DEGs prominently participating in glycosylation and angiogenesis pathways were discovered in BD using the ‘ggpubr’ package (Fig. 1D). Few disease-associated KEGG annotations, including ‘measles’, ‘hepatitis C’, ‘influenza A’, ‘NOD−like receptor signalling pathway’, ‘Epstein−Barr virus infection,’ and ‘antigen processing and presentation’, indicated that pathogen recognition and innate immune responses were activated in patients with BD relative to HCs or DCs, which coincided with the GO functions (Table S2).

Fig. 1
figure 1

Differentially expressed genes (DEGs) and Kyoto Encyclopedia of Genes and Genomes annotations. A DEGs in CD4+ T cells discriminating Behcet’s disease (BD) from inflammatory bowel disease (IBD). B DEGs in CD14+ monocytes discriminating BD from systemic lupus erythematosus. C DEGs in CD16+ neutrophils discriminating BD from IBD. D DEGs involved in both angiogenesis (ATP2B4, MYOF, NRP1) and glycosylation (GXYLT1, ENG, CD69, GAA, SIGLEC7, SIGLEC9, SIGLEC16) were presented via the vioplot package in R programme

Identification of clinical trait-specific module genes (MGs)

We focused on CD14+ monocytes and CD16+ neutrophils to hunt for clinical trait-specific MGs utilising WGCNA co-expression network analysis (Figure S2 and Table S3). When combining DEG expression profiles with clinical traits, we considered the most relevant modules associated with clinical manifestations of BD as significant modules comprising MGs for subsequent PPI and MCODE subnet analyses (Figure S3). Profoundly, we exploited GO/KEGG analyses in each MCODE subnet of CD14+ monocytes, uncovering ‘defence response to virus’, ‘cytokine-mediated signalling pathway’, ‘regulation of innate immune response’, ‘lymphocyte/leukocyte mediated immunity’, ‘cell killing’, ‘natural killer cell-mediated immunity’ and ‘positive regulation of leukocyte adhesion to vascular endothelial cell’ as pathways associated with BD pathogenesis. Concerning CD16+ neutrophils, ‘defence response to virus’, ‘cytokine-mediated signalling pathway’, ‘lymphocyte-mediated immunity’ and ‘leukocyte migration involved in inflammatory response’ were identified in GO analyses (Figure S3). Briefly, we unveiled a pathogen-triggered, lymphocyte-mediated, vessel-inflamed and innate immune system-dominant characteristic of BD pathogenesis. Surprisingly, extra-molecular access to B cell-mediated immunity, mature B cell differentiation and immunoglobulin-mediated immune response might illuminate the possible participation of acquired immunity in the development of BD.

Moreover, the enrichment of ‘Epstein−Barr virus infection’, ‘TNF signalling pathway’, ‘NF−kappa B signalling pathway’, ‘measles’, ‘hepatitis C’, ‘influenza A’, ‘NOD-like receptor signalling pathway’, ‘natural killer cell-mediated cytotoxicity’, ‘Th1 and Th2 cell differentiation’ and ‘Th17 cell differentiation’ in KEGG analysis supported our assumption regarding the aetiology of BD (data not shown).

Random forest (RF) and neural network (NN) analyses to establish disease classification model

Regarding the DEG set of CD14+ monocytes, we identified six PGs (P2RY2, DDHD1, SLC6A12, RPS29, USMG5, MS4A3) that distinguished BD and AAV samples using an appropriate MeanDecreaseGini index to construct a subsequent NN. Consistently, five PGs (EPHX1, PKP2, FCMR, EIF4B, HORMAD1) differentiate BD and HC samples, nine PGs (NEXN, SNORD30, SNORD59B, SNORD28, ANO5, OTUD1, LOC399900, WDR74, RNU11) discerned BD and IBD samples and six PGs (KLRF1, SLC7A8, DKFZP434L187, FCGR1B, IFI6, IFITM3) distinguished BD and SLE samples. Leveraging R ‘pheatmap’ and k-means clustering, it was apparent that PGs could separate BD samples from HC, IBD and SLE samples (Fig. 2A–D).

Fig. 2
figure 2

Random forest construction using differentially expressed genes. AH The upper graph presents the minimum value of the cross-validation error rate. The X-axis presents decision tree numbers, and the Y-axis presents the error of cross-validation. The lower graph illustrates the expression heat map of paramount genes with importance value ≥ 1 between patients with Behcet’s disease (BD) and healthy controls

As for CD16+ neutrophils, PGs recognized as candidates for constructing the NN model could preferentially distinguish BD samples from AAV (IRAK3, COMMD6, FAM58BP, MALAT1, IRF7, COPZ1), HC (MIR326, CSTF3, MIR15A, TCEANC2), IBD (FAM58BP, ATP5J2, USF1, SNORD59B, MIR15A, FAM212B) and SLE samples (COMMD6, C10orf12, PCBP2, MTERF2) (Fig. 2E–H).

Afterwards, we input PGs from the RF classifier into the NN model and manually set the neuron levels of the hidden layer from 3 to 6 in E-MTAB-2713 as the training set. The visualisation of the predicted weight and NN model is presented in Fig. 3. PGs from CD14+ monocytes remarkably differentiated BD with an accuracy of 0.962; identically, the BD prediction value of PGs from CD16+ neutrophils was as high as 0.929 (Table 1).

Fig. 3
figure 3

Establishment of the neural network. AH The upper panel presents the disease classification model containing input, hidden and output layers, and the thickness of the connecting lines represents the scores and weight for paramount genes. The lower panel presents the ROC curve and areas under the curve for the training dataset of E-MTAB-2713

Table 1 Prediction accuracy and AUCs differentiating patients with Behcet’s disease (BD) from control subjects (for training group E-MTAB-2713)

Diagnostic performance of paramount genes (PGs) and validation in BD clinical subtypes

Receiver operating characteristic (ROC) curves with AUCs were portrayed via the ‘pROC’ package in both the training and validation groups (Table 1). AUCs exceeding 0.900 for BD diagnosis in the training group suggested outstanding diagnostic performance for all NN models (Fig. 3). We also used two other datasets, namely GSE17114 and GSE61399, to further verify the aforementioned diagnostic models (Figure S4). More specifically, PGs constituting the NN model were deeply utilised for discerning clinical subtypes in GSE17114 regarding MB versus non-MB, OB versus non-OB and VB versus non-VB. For PGs from CD14+ monocytes constituting the NN model differentiating patients with BD from HCs, these genes prominently distinguished MB and non-MB samples and VB and non-VB samples with AUCs of 0.8750 and 0.9091, respectively (Fig. 4A–B and Table 2). The discriminative ability of PGs from CD16+ neutrophils in the NN model was excellent for BD clinical manifestations, as indicated by AUC exceeding 0.70 (Fig. 4B and Table 2).

Fig. 4
figure 4

Validation performance of diagnostic genes for determining clinical subtypes of Behcet’s disease (BD) in the GSE17114 dataset A for paramount genes in CD14+ monocytes and B CD16+ neutrophils. CD Pathogenic genes involved in glycosylation and angiogenesis

Table 2 Diagnostic values differentiating the clinical subtypes of Behcet’s disease (for validation group GSE17114)

Importantly, enriched genes associated with angiogenesis (ATP2B4, MYOF, NRP1) and glycosylation (GXYLT1, ENG, CD69, GAA, SIGLEC7, SIGLEC9, SIGLEC16) pathways in E-MTAB-2713 also demonstrated great discriminative ability for BD clinical subtypes in the GSE17114 dataset (Fig. 4C–D and Table 2).

Landscape of immune cell infiltration

We calculated the relative distinctive immunocyte spectrum in both patients with BD and controls exerting the total gene expression data of CD14+ monocytes and CD16+ neutrophils. To the extent of CD14+ monocytes, chemokine receptor (CCR), para-inflammatory, plasmacytoid dendritic cell (pDC), Th cell, Tfh cell and type I IFN responses were significantly more prominent in BD than in AAV (Fig. 5A). Immature dendritic cell (iDC) and T cell co-inhibition was stronger in HC samples than in BD samples, whereas pDC infiltration was increased in BD (Fig. 5B). B cell, CCR, NK cell, pDC, Tfh cell and type I IFN responses were dramatically elevated in BD versus IBD (Fig. 5C). The ascent of NK cell, Th cell, Tfh cell, Th2 cell and type I/II IFN responses distinguished BD from SLE (Fig. 5D).

Fig. 5
figure 5

Single sample gene set enrichment analysis and immune cell infiltration (AD) presenting the relative distinctive immunocyte spectrum in both patients with Behcet’s disease and control subjects based on the genes expressed in CD14+ monocytes. EG Immunocyte scenarios based on genes in CD16+ neutrophils

In CD16+ neutrophils, we observed hyperactivation of NK cell, para-inflammation, Th cell, Tfh cell and type I IFN responses in BD versus AAV (Fig. 5E). Concerning differences in immune cell infiltration between patients with BD and HCs, B cell, neutrophil, NK cell and type II IFN responses were promoted in BD samples, whereas HC samples were typified by antigen presenting cell (APC) co-stimulation, checkpoint pathways, T cell co-stimulation and increased Th2 cell counts in HC samples as expected (Fig. 5F). As for the immunocyte characteristics of IBD, neutrophil and NK cell counts, T cell co-inhibition and type II IFN responses were prominently increased in BD, highlighting diverse immunological differences with IBD (Fig. 5G). Next, we observed that B cell, NK cell and Tfh cell counts were increased in BD, whereas SLE was typified by APC co-stimulation, CCR responses, checkpoint pathways, para-inflammation, T cell co-stimulation and type I IFN responses (Fig. 5H). When applied to the GSE17114 dataset, significant distributions of Th1 cells, activated dendritic cells and mast cells were noted in the MB, OB and VB groups.

Discussion

In the present study, we depicted the infection-induced, glycosylation-involved, angiogenesis-promoted, vascular inflammation-related pathogenesis of BD based on biological/pathway enrichment analyses of DEGs in CD4+ T cells, CD14+ monocytes and CD16+ neutrophils in the E-MTAB-2713 dataset. MGs from the WGCNA co-expression network, which were subsequently applied to construct a PPI network, MOCDE subnet and proceed with GO/KEGG enrichment, revealed the lymphocyte-mediated participation of NK, Th1, Th2 and Th17 cells in the pathogenesis of BD from another aspect. Wondering which DEGs could preferentially discriminate patients with BD from both HCs and DCs, RF classifiers and NN models were utilised to screen gene markers for BD differential diagnostics. In the verification phase, we surprisingly discovered the excellent capability of the identified PGs in the clinical sub-typing of patients with BD concerning MB, OB and VB features. Finally, ssGSEA unveiled the diverse landscape of immune cells between patients with BD and controls, providing evident proofs of dysregulated immune tolerance and T cell activation attributable to immature dendritic cell suppression as well as T cell co-inhibition in BD. Moreover, the promoted CCR, para-inflammation, pDC and type I IFN responses indicated a pro-inflammatory status and the foreseeable excitation of autoreactive T cells. Hence, elevated counts of Th, Tfh, Th2 and NK cells support the previous hypothesis of a core pathogenic role of T cells in the aetiology of BD.

In CD14+ monocytes, EIF4B and HORMAD1 were identified as DEGs differentiating patients with BD from HCs and patients with IBD/SLE (Figure S5), and these results were confirmed in the NN model in both the training and validation sets. EIF4B is acknowledged for its ability to initiate protein translation, facilitate either pre-initiation complex docking or scanning from the 5′ end to the 3′ end at the first codon and promote cell survival and proliferation [31]. Imbalanced EIF4B protein expression is related to Alzheimer’s disease, lymphoma, leukaemia and hepatocellular carcinoma, and it also modulates anti-viral immunity by IFN-stimulated genes in innate responses [32]. An in vivo study revealed that EIF4B conditional knockout mice were prone to viral infection, severe lung inflammation and impaired NK cytotoxicity during influenza A virus infection [32]. RNA-seq demonstrated that EIF4B deficiency led to disrupted T cell signalling and differentiation [33], thereby confirming the crucial role of translational control in viral pathogenesis. HORMAD1 is aberrantly expressed in multiple cancers, leading to perturbed genomic stability and DNA damage repair [34]. We observed higher EIF4B expression and lower HORMAD1 expression in patients with BD relative to HCs, implying the significance of translational control and anti-infective activity in BD. In addition, EIF4B and HORMAD1 were validated in GSE17114 for discriminating MB and VB.

We observed enrichment of SERPING1 in relation to ‘complement activation’, ‘lectin pathway’, ‘regulation of humoral immune response’ and ‘fibrinolysis’, thereby contrasting patients with BD from those with AAV and SLE. This might convey hyperactivation of the complement system as well as abnormal excitation of the coagulation pathway, overlapping the frequent occurrence of deep vein thrombosis in the lower extremities and cerebral venous sinus as well as arterial thrombosis. RNA-seq analysis also revealed upregulation of SERPING1 in degrading fibrin thus influenced coagulation in SLEs [35]. Prior research indicated that downregulated SERPING1 could inhibit complement cascades via C3, exacerbating primary Sjögren’s Syndrome (pSS) [36].

OAS2 is an IFN-induced, dsRNA-activated anti-viral enzyme involved in innate anti-viral responses, occupying a vital status in lupus nephritis [37], rheumatoid arthritis (RA) [38] and pSS [39]. This coincides with our findings of the pathogenic involvement of viral infection in GO annotation as well as ‘NOD-like receptor signalling’ for pathogen recognition. IRF7, encoding interferon regulator 5, is involved in the pro-inflammatory stage. The gene carries risk alleles of SLE, and it might affect the phenotypes of SLE through altered DNA methylation [40, 41]. Seeking for GO annotation, increased GBP5 was gathered in the immunological pathway for the positive regulation of NLRP3 inflammasome complex assembly triggered by infection and IL-18 production in our study, contradicting the conclusion that selectively increased GBP5 levels in the synovial tissue of patients with RA consequently blunt pro-inflammatory cytokine expression [42]. Taken together, the aforementioned genes could underlie the infection-primed pathogenesis of BD adequately.

Concerning CD16+ neutrophils, all DEGs in both patients with BD and controls are represented as a Venn plot in Figure S6. STAT2 is induced by type I IFN, and it initiates the activation of IFN, the expression and function of which are augmented in SLE [43]. A recent study observed STAT2 and its subsequent pro-inflammatory effect in pemphigus vulgaris in the surrounding and central areas of skin lesions [44], which echoes the mucocutaneous manifestation of BD. MOV10, which has helicase activity against RNA viruses, is also an IFN-inducible gene [45]. The OASL gene has a similarly good response to viral infection induced by IFN signatures [46]. We noticed enhanced STAT2, MOV10 and OASL expression in patients with BD relative to those with SLE, and GO/KEGG annotation unveiled ‘NOD-like receptor signalling’, ‘cytokine-mediated signalling’ and ‘viral infection’ pathways in patients with BD, proclaiming intense inflammation to combat the potential infection. TCEANC2, located in the PARK10 region, is suggested to be involved in RNA processing, and it is major locus for Parkinson disease [47]. In addition, it might be one of the reliable biomarkers for differentiating the MB, VB and OB phenotypes of BD.

CD69 represents an immunoregulatory T cell receptor and C-type lectin. Selective deficiency of CD69 can exacerbate tissue damage, accelerate Th17 cell differentiation, suppress pro-inflammatory responses and increase the risk of autoimmune and chronic inflammatory diseases [48]. Our findings of decreased CD69 expression and its outstanding discriminative accuracy for BD clinical subgroups corroborated previous findings and potentially explained the amplification of Th17 in the peripheral blood of patients with BD.

Given that BD is considered a T cell-reliant disease, alterations in the T cell balance, namely Th cell expansion and Treg dysregulation, contribute to the deterioration of BD, in which inflammatory damage could contribute to the recruitment and activation of multiple immunocytes and cytokines. Our study revealed increases of Th, Tfh and Th2 cell counts. Previous research indicated that the proportions of Th1, Th2 and Th17 cells were significantly higher in patients with BD than in HCs, and the Th17/Th1 ratio was significantly higher in patients with ocular involvement or folliculitis than in those without corresponding symptoms [49]. The percentage of Th17 cells expands in the active stage of the disease and declines in remission [50], suggesting that the scale of Th cells is closely related to BD features and activity. Geri et al. [51] found that IL-21 generated a Th17/Treg imbalance and inflammation in BD. CD4+ IL-17A+ Th17 cell counts are increased and CD4+ Foxp3+ Treg are decreased in patients with active BD. Meanwhile, Ahmadi and colleagues [52] confirmed that CD4+ IL-17A+ Th17 were higher in patients with BD than in healthy people, whereas CD4+ CD25+ CD127 Treg cells were decreased, resulting in a significantly increased Th17/Treg ratio in BD. Expression of the Tfh cell surface markers CXCR5, PD-1 and ICOS on circulating CD4+ T lymphocytes was increased in the CD4+TCRβ+ population in patients with BD in comparison with the findings in HCs [53], supporting our observations.

NK cells are prominent as immunocytes, mediating immune regulation through cytolysis in auto-inflammatory diseases, in which the cytotoxicity and degranulation of NK cells exacerbate BD episodes. An increased NK1 cell/NK2 cell ratio leads to the dominance of IFN-γ secreted by CD16+ NK1 cells, which inhibits the moderating effects of NK17 and NK2 cells in mucocutaneous BD. Simultaneously, it reflects disease activation and relapse, pointing to an apparent interaction between NK cells and IFN [54]. The NKG2D+ lymphocyte frequency is compactly associated with the BD activity score [55], the monitoring of which assists clinicians in discriminating the disease stage of patients with BD with greater than 90% specificity. Our research should facilitate deeper cognition of hereditary factors as pivotal drivers of NK infiltration in the BD pathogenesis.

Dendritic cells present antigens to activated T cells. Among dendritic cells, pDCs can secrete IFN-α in innate immunity, and they play a pathogenic role in diseases including autoimmune diseases such as SLE by producing large amounts of IFN. Meanwhile, immature dendritic cells accelerate immune tolerance [56]. It is reported that CD123+CXCL16+ pDCs are positively correlated with IFN-α in BD, being involved in Th1 type immune responses [57]. In current ssGSEA analysis, downregulation of iDCs and T cell co-inhibition while upregulation of pDCs have unmasked impaired immunological tolerance referring to autoreactive pathogenesis in Behcet’s disease.

Intriguingly, we proved the extensive infection aetiology related to prone genes in BD. Previously, HSV-1 DNA levels were higher in patients with BD and gastrointestinal involvement than in those with Crohn’s disease, but anti-viral drugs against HSV were not effective treatments [5]. Ileal destruction in a patient with BD who received cyclosporine A for eye involvement was demonstrated to be interrelated with CMV infection rather than active manifestations of gastrointestinal involvement [58]. A history of tuberculosis is an independent risk factor for BD [4]. Of note, instead of formulating active infections, bacteria or viruses are speculated to change the immune responses to pathogens themselves and activate autoreactive T cells to produce cytokines that enables tissue-damaging neutrophil and macrophage infiltration in genetically susceptible participants with BD.

Systemic inflammation, immunocyte infiltration and vascular thrombosis and damage constantly occur in the pathogenesis of BD, and the graphical abstract of BD pathogenesis has been summarised in Figure S7. Leukocytes synthesise angiogenic factors that reversibly amplify inflammation through recruiting immunocytes, affecting angiogenesis in autoimmune diseases. The linkage of haemostasis and angiogenesis is not understood on account of the release of angiogenic factors by platelets, which irritate the process of neovascularisation [59]. Considering glycosylation is the major post-translational modification contributing to cellular maturation and functions, glycosyltransferases produce diverse glycoproteins. Aberrant glycosylation is regarded as a mechanism that causes tumour heterogeneity, and glycol-gene signatures have acceptable prognostic value in the stratification of pancreatic ductal adenocarcinoma [21]. Inspired by these findings, we retrieved genes regulating angiogenesis and glycosyltransferases from past citations to unearth potential gene signatures for heterogeneous phenotypes of BD. Importantly, pathway genes aggregated in angiogenesis (ATP2B4, MYOF, NRP1) and glycosylation (GXYLT1, ENG, CD69, GAA, SIGLEC7, SIGLEC9, SIGLEC16), identically reinforcing their eminent discriminative value in subgroup classification for BD. However, our study has several limitations. First, we did not filter probes without gene symbols at the beginning of the bioinformatical analyses; thus, the non-coding RNAs were included into our disease classification models, which might hinder the further validation of these PGs using polymerase chain reaction platform among different BD cohort. Second, there is a potential bias in our methodology due to the small number of BD patients and a large cohort of DC/HC from dataset E-MTAB-2713.

In conclusion, our research identified potential gene signatures carried by CD14+ monocytes and CD16+ neutrophils in training and verification datasets with suitable prediction accuracy for deciphering BD phenotypes. Pivotal pathogenic characteristics of pathogen infection, glycosylation and angiogenesis are concurrently associated with the incidence of BD.