Introduction

Systemic lupus erythematosus (SLE) is a chronic multi-system autoimmune disease with a wide spectrum of manifestations ranging from minor cutaneous involvement to severe major organ damage [1]. While the exact etiology of SLE is largely elusive, there are growing pieces of evidence highlighting the potential role of both intrinsic (host genetic) and extrinsic (viral infections) risk factors [2, 3]. SLE is mainly considered a classical B cell–mediated autoimmune disease. Indeed, abnormal central and peripheral tolerance of B cells and a large number of self-reactive B cells are reported in SLE patients that can produce a variety of autoantibodies [4]. B cells can facilitate autoimmunity via secreting autoantibodies and mediating immune responses through their interactions with T cells, dendritic cells, cytokine production, presenting autoantigens to T cells and most likely by secreting pro-inflammatory cytokines [5,6,7,8]. Autoreactive B cells target several self-antigens in different positions in cells through producing autoantibodies [5, 9]. Binding of autoantibodies to their target antigens leads to the activation of complement and inflammatory cascades [10, 11]. It has been shown that SLE patients could not eliminate auto-reactive B cells during their development [12], which raises several questions. Indeed, what are the origins of autoreactive B cells, how these cells develop, how they are normally controlled in healthy individuals, how they activate in disease, and what is the role of risk factors in this phenomenon? The exact pattern of gene expression or involved pathways in B lymphocytes in SLE would provide a better picture with regards to these questions as well as patients’ management. The expression of some gene sets (modules) is highly correlated in different diseases, which can be identified by weighted gene co-expression network analysis. These modules comprise key gene groups involved in the significant pathways that may constitute the major parts of the pathogenesis route. As such, the present study aimed to investigate modules containing highly co-expressed genes in B lymphocytes that implicate in the pathogenesis of SLE. Through this analysis, we report groups of the correlated genes that may act as major players in SLE disease. We also identified the key genes in the selected modules and explored their biological functions in various pathways.

Materials and methods

Gene expression datasets, preprocessing, and integration

The Gene Expression Omnibus database (ncbi.nlm. nih.gov/geo/) was searched in order to find the studies reporting the expression levels in the SLE subjects. To find the relevant studies, keywords including “lupus,” “systemic lupus erythematosus,” and “SLE” were initially applied. Next, studies containing the high-throughput microarray analysis on the B cell samples of human subjects were selected. The exclusion criteria were studies accomplished on the non-human samples and cell line as well as non-coding profiling by array. Two independent investigators searched and collected datasets. Details of the search study are shown as a flowchart in Fig. 1. A total of three microarray gene expression datasets namely GSE10325, GSE30153, and GSE4588 were downloaded. While the first dataset was collected with Affymetrix Human Genome HG-U133A, the other two collected with Affymetrix Human Genome HG-U133A_Plus_2. SLE samples were prepared from the purified B lymphocyte cells sorted from PBMCs. The patient samples have not been treated by immunosuppressive drugs as well as high dose immunocytotoxic therapies or steroids. Almost all patient samples in GSE10325 had the systemic lupus erythematosus disease activity index (SLEDAI) scores of more than 4. This index was not reported for the SLE samples of GSE4588. The SLEDAI for the patient samples of GSE30153 was reported between 0 and 4. The samples and genes with too many missing entries were filtered using goodSamples function and the weighted gene co-expression network analysis (WGCNA) performed by the WGCNA package in R. The probes with unknown gene symbol were filtered and, in order to merge the probes, the probe-level expression profiles converted to gene-level expressions using the collapse Rows function [13, 14]. All gene symbols from the different datasets were finally intersected and the common genes in all included dataset used for further analysis [15].

Fig. 1
figure 1

Flowchart of the search strategy

Construction of weighted gene co-expression network analysis

The WGCNA package was employed to construct a weighted gene co-expression network for the integrated SLE datasets [16]. Pearson’s correlation matrix including correlation values between the expressions of all genes was initially calculated. Soft-thresholding power (β) was determined by calculating the scale-free topology fit index for different powers and the adjacency matrix was computed with the selected power. In the next step, the adjacency was transformed into a topological overlap matrix (TOM) that contains the sum of each gene adjacency with all other genes for network construction. TOM was then converted into a dissimilarity matrix and the co-expressed genes were classified in a hierarchical cluster tree. Afterward, a dynamic tree cut algorithm was used to generate modules. The module eigengenes were calculated to determine the gene expression profiles of each module. Finally, the mergeCloseModules function was used to merge similar modules into a single module [17].

Finding candidate modules and genes

The biologically meaningful modules were identified through the submission of involved genes in the Search Tool for the Retrieval of Interacting Genes (STRING) database [18]. The highly interconnected genes in each module were then utilized to construct the protein–protein interaction network (PPIN) [19]. The PPIN was visualized using Gephi 0.9.2 [20].

Enrichment analysis

To explore the biological process gene ontology and pathway terms, g:profiler website (https://biit.cs.ut.ee/gprofiler/) was employed [21]. Terms with p values < 0.05 were considered significant and the most relevant ones were chosen for further evaluation and interpretation.

Results

Data preprocessing and sample selection

Gene expression datasets were log-transformed and normalized. Samples were then evaluated in terms of missing entries and zero-variance genes. No sample was eliminated after sample clustering. We extracted 38 samples related to SLE from 3 microarray datasets, and a total of 10,529 common genes were applied for further analysis.

Construction of weighted gene co-expression network and detection of co-expression modules

In order to construct WGCN, the optimal soft-thresholding power was identified by calculating the scale-free topology fit index for various powers. Power 10, as the lowest power, which has the highest scale-free topology fit index was selected to produce a hierarchical clustering tree. The dynamic cut tree algorithm was next used to cut dendrogram, which led to 30 gene modules tagged by different colors. After merging similar modules by choosing a threshold of 0.15, 17 modules were identified (Fig. 2a). The eigengene adjacency heatmap indicated the relationships between the 17 co-expression modules (Fig. 2b). Three biologically meaningful modules including dark red, dark orange, and white were then selected based on exerting the highly interconnected proteins. These modules had also positive correlations (Fig. 2b). The list of genes involved in each module is listed in Table 1. The connected proteins in PPIN of each module are also illustrated in Fig. 3a–c

Fig. 2
figure 2

a Dendrogram of genes clustered based on a dissimilarity measure (1-TOM) with assigned module colors. The colored rows display different modules obtained by the dynamic tree cut method and after merging modules. b The eigengene adjacency heatmap of the determined modules. The correlation between modules are specified between the red (positive correlation) and blue (negative correlation) colors

Table 1 The genes involved in each module
Fig. 3
figure 3

The PPINs of a dark red, b dark orange, and c white modules. The node size and color indicate the degree level so that the higher degree is specified by the red color and larger size

Gene ontology and pathway enrichment analysis

The biological process gene ontology enrichment analysis of each module was explored. The substantial enriched GO terms and the involved genes are listed in Table 2. The genes of the dark red module were enriched in immune-related terms, Th17, Th1, and Th2 cell differentiation. The GO terms of the dark orange module were enriched in immune-, inflammation-, and cytokine-related terms. The terms related to immune response and response to viral infection were finally specified from the white module.

Table 2 The biological process terms enriched by each module

Pathway enrichment analysis was also carried out by exploration of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome Pathway Database. Table 3 contains the pathways that each module enriched. The following terms were enriched by module dark red: immune system, adaptive immune system, natural killer cell–mediated cytotoxicity, Th1 and Th2 cell differentiation, immunoregulatory interactions between a lymphoid and a non-lymphoid cell, primary immunodeficiency, translocation of ZAP-70 to immunological synapse, PD-1 signaling, and Th17 cell differentiation. The genes pertinent to the dark orange module were enriched in innate immune system, immune system, complement and coagulation cascades, regulation of TLR by endogenous ligand, Toll-Like receptor cascades, diseases of immune system, and antimicrobial peptides. Finally, the white module was enriched in viral pathways, immune system, and antiviral pathways including Epstein-Barr virus, measles, hepatitis C, influenza A, herpes simplex infection, immune system, cytokine signaling in immune system, interferon signaling, antiviral mechanism by IFN-stimulated genes, OAS antiviral response, ISG15 antiviral mechanism, and interferon alpha/beta signaling.

Table 3 The biological pathways enriched by each module

The signaling network involved in SLE pathogenesis

Figure 4 shows the proposed B cell signaling network involved in the pathogenesis of SLE. Analysis of gene co-expression in B cells indicates the involvement of vital pathways like JAK-STAT pathway, Toll-like receptors cascades, and antiviral immune pathway in innate immune response (PLAUR, TYROBP, FCN1, LYZ, CST3, CLU, CFP, FGL2, CD36, S100A8, FCER1G, S100A9, S100A12, ANPEP, CD14, FPR1, CLEC7A, SERPINA1, C5AR1, and CFD) as well as IL-17 signaling pathway and BCR signaling in adaptive immune response (FYN, PRKCQ, KLRB1, ITK, CD160, KLRF1, CD8A, CD3G, TRAT1, ICOS, CD3D, CD28, LCK, SH2D1A, and CD247). These results could further support the role of B lymphocytes, as a linker between innate and adaptive immune responses, in SLE pathogenesis. Our findings also suggest that primary immunodeficiency in genes including interleukin-7 (IL-7) receptor, ZAP70, and lymphocyte-specific protein tyrosine kinase (LCK) can be important in B cell dysregulation in the SLE patients [22,23,24]. Therefore, genetic and probably mix viral infection can be crucial in B cell dysregulation in these patients. Our results also indicate that viral infection and viral immune responses are important in triggering and maintenance of SLE disease. Among viral infections, Epstein-Barr virus (EBV) appears to play an important role given the fact that B lymphocytes are the main target cells for EBV infection. Elevated EBV-specific antibodies [25] and viral load [26] have previously been documented in SLE patients. However, the exact mechanisms by which EBV exert its potential in these patients yet to be defined. Measles is another viral infection that could also manipulate B lymphocytes [27]. While measles-related DNA has been reported in leukocytes, lymph nodes, and kidney and urine cell sediment from some SLE patients [28], these results were not confirmed by others [29].

Fig. 4
figure 4

Proposed B cell signaling network as a linker between innate and adaptive immune responses that are involved in the pathogenesis of SLE. Our differentiated genes are shown darker than others

Discussion

SLE is a complex autoimmune disease characterized by the loss of tolerance to self-nuclear antigens and the production of autoantibodies, leading to the attack of the immune system on the self-components and healthy tissues. Like other immune-complex diseases, the exact pathogenesis of SLE is largely unknown. Although B lymphocytes play a crucial role in this regard, there are still many unanswered questions with regards to the reasons behind self-tolerance defects and the exact role of B cells in disease pathogenesis. Answer to these questions could suggest novel therapeutic approaches through targeting active mechanisms in B cells. The expression of some gene sets (modules) is highly correlated in different diseases, which can be identified by weighted gene co-expression network analysis. These modules comprise key gene groups involved in the significant pathways that may constitute the major parts of the pathogenesis route. In the present study, we involved the B cell samples from subjects who did not receive medications that affect the analysis. The highly co-expressed modules involved in the progression of lupus were identified by WGCNA, as the potent network analysis tools. Through this analysis, three highly correlated genes, which have critical roles in the major pathogenesis pathways, were determined. Modules showed different active pathways including the primary immunodeficiency and adaptive branch of immune response (dark red), innate-like immune functions (dark orange), and viral infection and antiviral immune response (white) in the B lymphocytes of SLE patients.

While the primary cause of B lymphocyte dysregulation in SLE patients is not fully understood, the potential links between genetic diversity of the immune system and susceptibility to SLE have been identified in SLE patients using genome-wide association studies (GWAS) [30]. Recent advances have highlighted the important role of IL7R in proliferation and maintaining genomic integrity during B cell development [22, 31]. Furthermore, it has been reported that individuals with IL-7R gene polymorphism showed lower numbers of naïve B cells and higher numbers of antibody-producing plasma cells [32]. As such, IL-7R may be considered a key player in lupus. Our result further highlighted a significant expression of ZAP-70 in these patients. The association of ZAP-70 with increased BCR signaling has been previously reported [33, 34]. It appears that polyclonal activation of B cells can be behind ZAP-70 dysregulation in SLE patients [35]. LCK is another potential candidate for the dysregulated gene in SLE. Although the role of LCK has not been well studied in B cells, the expression of LCK was previously reported in B cell subsets that active in innate immune response and responsible mainly for producing nonimmune immunoglobulin [24]. It has been also reported that LCK plays a role in the transformation of B cells following EBV infection [36]. These findings indicate that dysregulation of LCK may further highlight the potential role of EBV in SLE patients.

Our results also indicate dysregulation of the initial triggering of complement pathway including genes CFD, CFP, FCN1, FPR1, and PLAUR in SLE patients. Complement deficiencies can play an important role in both autoimmunity and infections. Decreased expression of complement receptor 1 on B cells [37] and mannose-binding lectin [38] may also account for the defective immune response toward infectious microorganisms in these patients. Complement is also important in the negative selection of B cells and tolerance to self-antigens as shown in deficient mice [39]. Our results indicate that dysregulated complement particularly in alternative pathway could be important in the context of viral infection and lupus development.

Interferon (IFN) signature has been previously documented in SLE patients, as a general phenomenon [40]. It appears that the elevated levels of IFNs in these patients may be due to an aberrantly controlled chronic viral infection. Indeed, it was shown that EBV can induce IFN-α perhaps via TLRs [41] or through regulating the function of IFN regulatory factors [42]. However, BXD2 mice (useful animal model to study autoimmune lupus) were shown to develop an IFNAR-dependent autoimmune disease in the absence of exogenous stimulation [43] and repeated stimulation of mouse plasmacytoid dendritic cells (pDC) with TLR9 ligand can impair IFN-α production [44]. These findings further highlight the role of other cellular sources of type I IFNs. Recent studies suggest that B cells may also be an important source of type I IFN in SLE patients that may promote the development of autoreactive B cells [45]. As indicated in Fig. 3, many viruses can also modulate IFN production (as one of the main immune responses against viral infection) through targeting of several genes such as TRAF-3, IRF-7, ISG-15, PKR, STAT-1, OAS, and RIG-I in different pathways. However, it is not clear to what extent viral infection and perhaps mixed infections are involved in IFN-α production, as a general phenomenon, in lupus. B cell can facilitate autoimmunity through both innate and adaptive immune responses given the association of IFN and B cells. Dysregulated B lymphocyte pathways may support the notion in which B lymphocytes as a linker between innate and adaptive immune responses with regards to the pathogenesis of SLE.

Inducible expression of TLRs in B cells appears to be a key player in this scenario as it may provide a link between the innate and adaptive immune system. It has been recently recognized that murine B cells can express TLRs [46]. Although the functional relationship between the consequences of TLR and BCR ligation remains largely unclear, synergism between BCR and TLRs has been previously documented in murine system [47]. A close relationship between the endosomal TLR activation and the onset of SLE has been suggested and excessive activation of TLRs through sustained pro-inflammatory cytokines and chemokines secretion could disrupt the immune homeostasis in the host, which can involve in developing of many inflammatory and autoimmune diseases, such as SLE [48]. Although their clinical significance in SLE needs to be defined, CD14 [49] and CD36 [50] are considered active genes in diseases of immune system and have functional role in B lymphocytes. CD14 and CD36 could be imperative in both TLR cascades and dysregulation of B lymphocytes.

Based on our results, B cells could also modulate acquired immune response not only via antibodies production but also through several ways, which include survival signaling such as B cell receptor signaling pathway, chemokine and cytokine production, Th1 and Th2 cell differentiation, Th17 cell differentiation, IL-17 signaling pathway, costimulatory cytokines, dysregulation of inhibitory molecules (PD-1 and CTLA-4), inflammatory response pathway, leukocyte transendothelial migration, and antigen processing and presentation.

Several genes including CD247, CD3G, and Fyn dysregulated in B lymphocytes that are involved in FCγR activation. FcγRIIB is only FcR that acts as a critical negative regulator in immune complex driven reactions. Indeed, partial restoration of FcγRIIB expression in B cells can rescue mice from developing an SLE-like phenotype [51]. Moreover, polymorphisms in FcγRIIB have been shown to be associated with susceptibility to SLE development [52].

Fyn also acts as positive regulators of BCR signaling, which suggests a potential role in energy loss. While the absence of Fyn is thought to be protective, Lyn deficiency, as an active gene in the negative selection of B cells, aggravates auto-/inflammatory diseases [53]. These findings further support the crucial role of the balance between Lyn and Fyn in lupus nephritis patients [54]. Our results further indicate the dysregulation of NF-κB in lupus. Given the fact that NF-κB provides necessary mediators for the survival of immature B cells and mature lymphocyte differentiation [55, 56], uncontrolled activation of NF-κB can induce autoimmune disease [57]. Moreover, B cells can be considered a potential source for cytokine production in SLE. Several cytokines and chemokines including IL-17, IL-10, IL-2, IFN-g, CCL4, and CCL5 are thought to involve in SLE pathogenesis as previously reported [58]. Interactions between B and T cells appear to play important role in autoimmunity development through different mechanisms such as inflammatory response [59], T cell differentiation [60, 61], and Ag presentation [62].

Although we performed an integrational analysis by merging three datasets, there were several limitations. The identified co-expressed genes were determined by analysis of microarray datasets. Further in vivo and in vitro studies should be performed to validate the potential genes associated with SLE. Although we merged several datasets, more samples could increase the reliability of bioinformatics analysis. Moreover, we only considered the gene expression levels in the B cells; however, we should consider the same analysis for the T cells in future work. Nevertheless, this study sheds light on the possible pathogenesis mechanism of SLE.

Conclusions

WGCN analysis further highlights the potential role of B lymphocytes, as a linker between innate and adaptive immune responses in the SLE pathogenesis. It appears that secondary (infections) and underlying (some genetic disorders) factors can dysregulate B lymphocytes, which may facilitate SLE development. As such, identifying novel biomarkers and pathways in lupus would be of importance.