Introduction

Systemic lupus erythematosus (SLE) is a complex autoimmune disease associated with a spectrum of symptoms [1]. Disease is characterized by a breakdown in immune tolerance which promotes the formation of autoreactive B and T cells, abnormal cytokine production, and the subsequent generation of autoantibodies against DNA- and RNA-based self-antigens [1, 2]. Overall, women are nine times more likely to develop SLE compared to men, with an ethnic disparity of increased frequency and severity of disease in women of African American and Asian ancestry relative to European [3]. As with many autoimmune diseases, there are genetic, environmental, and hormonal associations [4, 5]. The concordance rate of SLE in monozygotic identical twins is 30 to 50%, whereas the overall heritability of SLE within a family is estimated to be 66% [5,6,7]. For the majority of patients, disease is genetically complex resulting from multiple genes or genetic variations with a few monozygotic exceptions such as C1q deficiency [6].

An explosion of genome-wide association studies (GWAS) and next-generation sequencing (NGS) studies have been performed over the last 10 years which have identified over 100 loci with a genome-wide significant association with SLE; top loci are summarized here [8], and the remainder can be found within the GWAS catalog [9]. The power of GWAS is the ability to identify SNPs, repeat sequences, or structural variations occurring within a population without bias [10]. Nearly all identified variants are located within noncoding regions of the genome with only a small portion within protein-coding regions (Fig. 1) [8, 11•]. Specifically, variants are enriched within super enhancers, stretch enhancers, and multiple enhancer variant loci especially near regions of major immune differentiation regulator binding sites or gene binding sites dependent on a specific signal [12]. A significant portion of the GWAS risk alleles demonstrate roles in B cell activation and signaling or in genes directly or indirectly related to the induction of type I interferons [8, 13, 14].

Fig. 1
figure 1

Description of SLE GWAS-identified loci. a Functional class of identified non-HLA loci with genome-wide significance (p < 5 × 108) from SLE GWAS studies. Frequency indicates the percentage of the total significant loci. Most loci occur within various non-coding regions with a small percentage occurring within the ORF. b Frequency of HLA to non-HLA loci significantly associated with SLE. All data is extracted from the GWAS catalog [9]

Many of the identified polymorphisms occur within human Leukocyte antigen (HLA) alleles; however, non-HLA allelic variations account for the majority of loci (Fig. 1) [15]. Comparison of non-HLA loci, shared between autoimmune diseases, identified SLE to share less genetic overlap with other autoimmune diseases, indicating a different genetic requirements for SLE susceptibility [16]. Furthermore, given the breadth of clinical manifestations in SLE, it is unlikely that a single SNP is responsible for the acquisition of disease in every patient, but rather an accumulation or interaction of multiple variants over time known as the cumulative hit hypothesis [17••]. Despite the identification of a growing number of SNPs associated with SLE, a small proportion of these have been validated [11•]. Furthermore, methods for assessing the individual contributions of identified variants and the potential for compounding effects from multiple variants has remained a challenge for their validation. The focus of this review is to outline recent validation efforts of GWAS identified SLE candidates with special emphasis on the techniques used or that could be used.

Computational Approaches

GWAS studies capitalize on linkage disequilibrium (LD), the nonrandom association of alleles at more than one loci, to map disease-associated variants which tend to maintain a strong LD within a population for long periods of time [18]. GWAS identified that variants must pass a statistical p value threshold of < 5 × 10−8 to be considered a true effect with genome-wide significance [10]. To date, there are 24 GWAS and meta-analysis studies documented within the GWAS catalog, dominantly representing European and Asian populations [8, 9]. From these studies, 388 associations have been made with a p value of ≤ 5 × 10−8 [9]. Interestingly, the most recent GWAS studies account for only 28% of the heritability of SLE, leaving approximately 38% unaccounted for [7, 19]. Identified loci span most of the genome; however, many of the top hits are found in chromosomes 1–7 (Fig. 2).

Fig. 2
figure 2

Simplified chromosomal distribution of SLE GWAS-identified loci. All GWAS identified variants are plotted relative to their significance and chromosomal location. The top 25 most significant variants have been labeled. All data is extracted from the GWAS catalog [9]

In examining the GWAS data, cohort sample size, sex, ancestry, and disease state must be considered since they all can impact identified candidate alleles [10]. Insufficient power due to low sample size can produce false positive associations known as “Winner’s curse” [10]. The improved accessibility of GWAS data within public databases has allowed for both further validation of previously reported SNPs and increasing the sample sizes of new analyses [10, 20]. Predictions for SNP function and chromatin modifications surrounding a known gene can be extrapolated from genome projects such as ENCODE, Epigenome, RoadMap, Blueprint, rSNPBase, ReguomeDB, and GTEx, providing a means to prioritize genes for experimental validation [19,20,21].

To computationally bridge the gap between the GWAS data and phenotypic disease observations, a variety of expression quantitative trait loci (eQTL) studies have emerged [22]. eQTL studies pair genome-wide expression data with marker-based genotyping to characterize modulations in gene copy number with the identified genetic variants acting in cis or trans [19]. A confounding feature of many whole-gene expression RNA-seq-based eQTL studies is the limited resolution of independent exons and alternatively spliced isoforms which may be overlooked when examining gene-level differences alone [23]. To combat this, pairing GWAS with exons, introns, exon-junctions, and various isoform expression data from RNA-seq can be explored to increase the resolution of low abundance RNA isoforms [23]. Application of this analysis revealed that SLE variants have an increased likelihood of associating with exon- and junction-level cis-eQTLs, and this analysis outperformed standard RNA-seq-based gene-level analyses [23]. However, eQTL studies overall have failed in identifying all GWAS loci, potentially due to cell-restricted gene expression or the three-dimensional folding of the chromatin resulting in physical interactions which can modulate SNP containing enhancers and/or their related transcripts [12]. This is supported by Hi-C data examining the chromatin folding from B cell lymphoblast lines. GWAS identified risk loci are found within a common regulatory circuit with “outside variants,” which are those variants which demonstrate a weak LD with the risk loci, modulating transcript expression [12]. In T and B cell lines, there is evidence that chromatin folding may promote interactions between GWAS SNPs and distant genes instead of neighboring genes as originally expected [24]. To date, no Hi-C studies have been performed using primary cells from SLE patients. Allele-specific histone posttranslational modifications can be manifested from the presence of allelic variants and can modulate enhancer function [25]. These histone QTLs (hQTLs) are specifically enriched within enhancers of SLE patients and influence their respective gene expression compared to non-hQTL variants [25]. While computational methods aid in prioritizing loci for further investigation, their translation to disease is dominantly limited to predictions.

Validation of SLE GWAS Candidate Genes and Risk Variants

Here, we have presented the most recent efforts in SLE GWAS candidate gene and risk variant validation (Table 1). Substantiated loci have been organized by the dominant pathways they are involved in. Of note, this is not an exhaustive list and many researchers are actively pursuing further validation of risk variants.

Table 1 Validated SLE candidate variants

Interferon Signaling Pathway

IRF5

Interferons play a well-documented role in the development of SLE. Therefore, it is not surprising that SNPs have been identified within multiple interferon regulatory factors (IRFs). The family of IRFs is a group of transcription factors which are widely expressed by immune cells and are responsive to TLR signaling, among others [77]. Of the identified IRFs, the most well characterized is IRF5. While prediction modeling for the role of IRF5 polymorphisms views its effects as a downstream consequence of autoantibody specificity which leads to elevated type I IFNs and SLE disease, substantial evidence has suggested a potentially primary role for IRF5 [78]. The dominant phenotype associated with IRF5 mutations is increased expression leading to elevated type I IFN production [35, 79]. Likely causal SNPs occur within the 5′ and 3′ UTR (rs2004640 and rs10954213) [80]. Furthermore, IRF5 variant rs4728142 appears to regulate IRF5 expression by modulating the binding of transcription factors such as ZBTB3 to the promoter [33]. IRF5−/− mice have reduced production of autoantibodies in a pristane-induced model of lupus. Moreover, IRF5 is required for local production of CCL19 and CCL21 promoting pDC recruitment to the site. pDC activity thereby, directly through antigen presentation and indirectly via cytokine production, enhances T and B cell activity [81]. Interestingly, Lyn, a src family kinase, which also has implications in the development of lupus was demonstrated to directly interact with and negatively regulate IRF5 transcriptional activity [82]. Monoallelic disruption of IRF5 relieved Lyn−/− mice of SLE disease [82].

Multiple groups have demonstrated a B cell intrinsic role for IRF5 [83, 84]. Specifically, IRF5 directly binds to and regulates the IgG2a locus, further demonstrating its role in progression of disease [77]. BLIMP1, an important regulator of plasma cell development, can be induced through IRF5 binding to its promoter [83]. While this group did not explore IRF5 in the context of autoimmunity, the implications are apparent [83]. Additionally, exploration of IRF5 target genes in healthy human B cells revealed that IRF5 binds to IRF4, ERK1, and MYC, all of which play critical roles in the differentiation, activation, and proliferation of B cells [84]. Importantly, IRF5 binds to the promoters of these genes following IgM/CpG-B stimulation, further indicating a role for IRF5 in regulating the downstream effects of TLR9, a known factor in SLE disease [84]. In contrast, a recent study explored the role of IRF5 SNP rs2004640 in non-SLE patients and found no difference in IRF5 or FcγRIIB expression following TLR7/9 stimulation of B cells. However, the SNP appears to affect transcription factor (SP1) binding within myeloid cell lines [34]. While there are definitive roles for IRF5 in the progression of SLE in mouse models, human studies examining IRF5 variants remain inconclusive.

IFIH1

IFIH1 (MDA5) is an intracellular sensor which induces a rapid type I IFN response following stimulation by double-stranded RNA and branched, high-molecular weight RNA [37]. Three SNPs have been identified within IFIH1 in African and European SLE populations, rs13023380 (intronic), rs1990760 (A946T; missense), and rs10930046 (R460H; missense) [85].

In a Chinese cohort of 400 SLE patients, a homozygous mutation at rs1990760 was associated with reduced IL-18 and granzyme B in patient serum paired with a slight reduction in anti-dsDNA antibodies [36]. IL-18 and granzyme B levels have previously been shown to be elevated in SLE patients and correlate to disease severity; therefore, how rs1990760 leads to reduced IL-18 and granzyme B production and their role in promoting SLE remains unclear [36].

Mice generated through homologous knock-in containing the polymorphism rs1990760 within the IFIH1 gene (A946T) demonstrate an increased responsiveness to self-ligands [37]. Following encephalomyocarditis virus (EMCV) infection or in the streptozocin-induced type 1 diabetes model, A946T mice produce elevated levels of type I IFN [37]. This variant has not been explored in an SLE mouse model. Despite the clear association between type I interferon production and recognition of self-ligands mediating SLE, overall it remains unclear how variants within regulators of IFN production lead to aberrant interferon production and disease.

BCR Signaling

FcγRIIB

Of the Fcγ receptors, FcγRIIB is the only one that serves to negatively regulate immune responses within B cells and monocytes [39]. A nonsynonymous mutation (alters amino acid sequence) within the transmembrane domain of FcγRIIB (rs1050501) results in the loss of regulatory function within both B cells and monocytes. Specifically, rs1050501 results in delayed diffusion through the plasma membrane, limiting its potential to capture immune complexes and regulate BCR signaling in primary SLE patient B cells [39]. The delayed diffusion kinetics is the result of reduced inclusion into sphingolipid rafts resulting from an altered three-dimensional structure [39,40,41]. A SNP within the Fcgr2b promoter (rs3219018) is associated with reduced Fcgr2b transcript levels in SLE patient B cells [42]. The reduction in mRNA was characterized in vitro in human B cell lines. The variant rs3219018 appears to limit the transcription factor AP-1, from binding to the Fcgr2b promoter [42]. Additionally, Fcgr2b−/− mice spontaneously develop SLE disease within a few months of age [86]. Therefore, variants which reduce the expression or alter the function of FcγRIIB are likely to have a role in advancing disease.

BANK1

Possibly causal variants have been identified within multiple regions of the BANK1 gene. BANK1 is mostly expressed within immature and mature B cells with roles in enhancing downstream effectors of BCR signaling [87]. Two variants (rs10516487 and rs3733197) occur within the coding region of the gene, while another is expressed at a branch point resulting in differential expression of isoforms [43]. These variants result in variable expression of BANK1 splice variants [87]. BANK1−/− mice crossed to B6.Sle1.yaa, an aggressive mouse model of lupus in which TLR7 is overexpressed, results in reduced IgG production potentially due to altered TLR7 signaling [88]. Due to the roles of BANK1 in BCR signaling, particularly within calcium mobilization, it appears feasible that BANK1 variants could impact the development of autoreactive B cells.

PRDM1

PRDM1 encodes the protein BLIMP1, an immune master regulator with functions in T cells, B cells, NK cells, and myeloid cells [89]. The SNP rs548234, located within an intergenic region downstream of PRDM1, is associated with reduced BLIMP1 expression in SLE monocyte-derived DCs from female patients [44]. Mechanistic studies indicated that the reduction in BLIMP1 is the result of an altered binding for the transcription factor Kruppel-like factor 4 (KLF4) which does not normally regulate BLIMP1 expression [44]. A reduction in KLF4 rescues the expression of BLIMP1 in the THP-1 cell line [44]. Dendritic cell-intrinsic BLIMP1-deficient mice, generated by crossing a BLIMP1 floxed mouse to a CD11c Cre mouse, develop autoantibodies, increased IL-6 production, and lupus-like disease in female mice, indicating a tolerogenic role for BLIMP1 in DCs [89]. The reason for sex bias in PRDM1−/− mice is not clear. Furthermore, in DCs, BLIMP1 induces microRNA let7c expression which, in turn feeds back to negatively regulate BLIMP1. Let7c is also suggested to regulate suppressor of cytokine signaling 1 (SOCS1), an important protein in the negative regulation of interferon signaling [90]. Importantly, BLIMP1 is critical in other cell types including plasma cells, and the role of identified variants in regulating the plasma cell response is unexplored.

CSK

C-Src tyrosine kinase (CSK) is expressed in complex with Lyn and serves to phosphorylate the C-terminal tyrosine of Src family of tyrosine kinases (SFK) to inactivate it [46]. SLE patients carrying the intronic risk allele rs34933034 demonstrate increased Lyn phosphorylation (Tyr508), enhanced BCR induced activation of mature B cells, and is correlated with elevated IgM levels [46]. This study focuses on the effects of the CSK risk loci on B cell function given the dramatically increased expression of CSK in B cells, particularly transitional and naive populations, but nevertheless, CSK is expressed in other immune cells such as T cells and may also modulate their signaling.

T:B Cell Interactions

TNFSF4

A single risk haplotype within the upstream regulatory region of TNFSF4 (OX40L) correlates with increased cell surface expression and transcript levels. This promotes increased costimulation of CD4+ T cells and enhanced activation of antigen-presenting cells (APCs) [91]. In the blood of SLE patients, OX40L is predominantly expressed by myeloid APCs, but not B cells [92]. In inflamed skin and kidney biopsies, OX40L is expressed on multiple cell types including myeloid APCs, but again excluding B cells [92]. Stimulation of myeloid APCs with ribonucleoprotein-containing immune complexes promotes OX40L expression in a TLR7-dependent manner likely mediating a feedback loop promoting further autoantibody production [92].

Studies on B cell-specific Tnfsf4 conditional knockout mice demonstrate a B cell intrinsic role of Tnfsf4 expression in the development of autoantibodies and generation of germinal centers when crossed to Sle16 lupus-prone mice [93]. The expression of Tnfsf4 in T cells was not assessed in the Sle16 or chronic graft-versus-host-disease model. However, based on data obtained from global TNFSF4−/− mice, there is likely a role for this protein in other cells [93]. These studies support a role for the OX40:OX40L axis in regulating multiple cell types in SLE, particularly in the context of TLR7 stimulation.

ITGAM

ITGAM encodes for integrin-αM (CD11b) and is primarily involved in leukocyte adhesion, but other roles have been described [49]. Located within exon 3 is a nonsynonymous SNP (rs1143679) resulting in a R77H mutation. R77H is associated with decreased phagocytic activity in human monocytes and monocyte-derived macrophages in a European American SLE cohort [49, 50]. The ITGAM SNPs rs1143675, rs1143679, and rs1143683 are associated with reduced serum type I interferon [51, 52]. Treatment of MRL/lpr mice with LA1, an CD11b agonist, promotes a moderate reduction in the induction of type I IFN signature genes and autoimmunity development [52]. Furthermore, three missense mutations in ITGAM are associated with dysregulated FOXO3 activity and increased type I IFN [52]. Thus, it is likely that ITGAM variants are potentially causative for some manifestations of SLE disease and targeting CD11b may have therapeutic potential.

Transcription and Translation Regulation

SMG7

Nonsense-mediated mRNA decay factor (SMG7) regulates up-frameshift 1(UPF1) phosphorylation and plays a critical role in the process of nonsense-mediated decay by interacting with both UPF1 and the mRNA decay complex [94]. Due to its role in transcript regulation, a SNP located within the promoter region of SMG7 (rs2275675) was explored within European populations of SLE patients [53]. Patients expressing the risk haplotype demonstrated reduced SMG7 expression in a dose-dependent manner in PBMCs [53]. Furthermore, reduced SMG7 mRNA levels correlate with increased ANA levels which could be recapitulated in vitro by knocking down SMG7 by siRNA in PBMCs [53]. It would be interesting to explore the ramifications of this variant in specific cell populations to interrogate differential impacts on SLE pathogenesis.

ETS1

V-Ets avian erythroblastosis virus E26 oncogene homolog 1 (ETS1) is a transcription factor that plays a role in regulating proliferation and differentiation in B and T cells [95]. Through frequentist and Bayesian fine-mapping, 16 variants within ETS1 have been identified as potentially causal [57]. The strongest association is found with the SNP rs6590330, which occurs within the coding region of the gene, leads to increased STAT1 binding to the ETS1 transcription start site potentially repressing ETS1 transcription. These findings were explored within an EBV-transformed B cell line [57]. In support of this, SLE patients demonstrate reduced mRNA levels of ETS1 compared to healthy controls [57]. ETS1 knockout mice develop autoantibodies to DNA, histone, cardiolipin, and MBP self-Ags, immune infiltrates into lung and liver tissue, and immune complex depositions within the kidney [96]. Therefore, the effects of ETS1 on SLE pathogenesis may due to reduced expression leading to altered T and B cell differentiation and expansion.

TLR, Cytokine, and Other Intracellular Signaling

PTPN22

PTPN22 encodes the protein Lyp which, as noted above, is involved in BCR signaling through its interactions with CSK and de-phosphorylates SFK destabilizing the kinase domain [46]. One of the most highly characterized autoimmune-associated SNPs is R620W (rs2476601) within the PTPN22 gene which results from a C to T mutation at position 1858 [97]. R620W is one of the few polymorphisms which occurs within the coding region of a gene and is found in patients with type I diabetes, rheumatoid arthritis, and SLE [97]. A knock-in mouse bearing an analogous mutation (R619W) generated through site-directed mutagenesis of 129 embryonic stem cells and injected into C57BL/6J blastocysts, develop autoantibodies and systemic SLE-like disease due to an increase in lymphocyte antigen receptor (BCR and TCR) responsiveness [60]. This enhanced autoimmunity was recapitulated in aged B cell intrinsic R619W mice [60]. Interestingly, R619W mice developed by another group on a C57BL/6 background do not develop dramatic autoimmune disease [98], indicating a contribution of 129 derived genes to accelerate SLE pathogenesis [60].

Healthy individuals bearing a single copy of the R620W mutation demonstrate increased frequencies of polyreactive recently emigrated and transitional B cells whereas healthy controls homozygous for R620W had comparable frequencies of autoreactive B cells to those of patients with type I diabetes or rheumatoid arthritis [61]. TLR7 stimulation of PBMCs from SLE patients with the R620W mutation shows reduced STAT1 activation and IFNα production [99]. This contradictory finding of reduced IFNα produced in R620W patients may partially explain SLE patient’s susceptibility to infections, but this remains untested [99]. In a cohort of patients with broad autoimmune diseases, homozygous or heterozygous expression of the 1858C/T variant leads to reduced CD4+ T cell calcium flux, IL-10 production, and an increased frequency of memory T cells. Similarly, memory B cells also demonstrate reduced calcium flux [62]. Furthermore, heterozygous patients demonstrate reduced memory B cell proliferation and altered B cell signaling characterized by reduced Syk phosphorylation [100]. The reduction in TCR/BCR signaling may impact selection within the thymus or bone marrow, respectively, allowing self-reactive cells into the periphery [62]. While it is clear that different populations of B cells respond differently to BCR signaling, it is unclear how a reduction in memory B cell proliferation promotes disease [100].

IL-10

IL-10, an anti-inflammatory cytokine, is elevated in patients with active SLE disease and correlates positively with disease severity [63]. SNP rs3122605 found within European populations is found 9.2 kb upstream of IL10, and its presence correlates with increased levels of IL-10 in SLE patients. Furthermore, this SNP generates a novel binding site for the transcription factor Elk-1 which is found to have increased phosphorylation in SLE patients. The increased phosphorylation of Elk-1 and expression of IL-10 correlates with active SLE disease in patient B cells [64]. Elevated IL-10 expression in SLE patients appears to increase the activity of autoreactive B cells and plasma cells [101]. However, the fine balance of IL-10 expression between promoting and regulating disease will be difficult to tease out.

TNFAIP3

Analysis of SNP rs6927172 located upstream of TNFAIP3, through CRISPR/Cas9-mediated insertion in HEK293 cells, was associated with differential expression of multiple genes [66]. The genes which were dominantly affected were IL20RA and TNFAIP3 through modulation of the long-range DNA looping of the TNFAIP3 locus [66]. The effect of this SNP within a complex model was not verified [66]. In a small Chinese cohort of 15 female SLE patients, TNFAIP3 expression was significantly reduced in CD4+ T cells with implications for regulation through chromatin modifications. CD4+ T cells from the SLE patient samples transfected with a TNFAIP3 expressing plasmid, resulted in a reduction of pro-inflammatory cytokine production such as IFNγ, IL-17A, and IL-17F, indicating that the loss of negative regulation via TNFAIP3 promotes disease in CD4+ T cells [102]. Of note, the patients were not assessed for the presence of TNFAIP3 variants [102].

CR2

The SNP rs3813946 within the 5′ UTR of complement Receptor 2 (CR2) is associated with enhanced CR2 transcription but is only detectable within homozygous patients. Furthermore, validation of the increased transcriptional activity in the CR2 promoter were carried out using a luciferase reporter assay within a Raji B cell line due to high expression of CR2 by B cell populations [67]. Therefore, further exploration of the role of this SNP within patients may be required.

TLR7

There is strong evidence for TLR7 signaling in SLE and other autoimmune diseases [2]. Recently it has been demonstrated that TLR7 can escape X chromosome inactivation within human monocytes and B cells [103]. This provides a rationale for the sex bias in some SLE patients. B cell-intrinsic TLR7 expression is required for spontaneous germinal center formation and autoantibody production in autoimmune-prone B6.Sle1b mice [104]. Furthermore, B6.Sle1b.yaa mice which overexpress TLR7 or mice treated with a TLR7 agonist demonstrate a dramatic increase in GC formation [104]. This is further supported by another TLR7 overexpressing model (TLR7.1Tg) in which there was a dramatic increase in transitional B cell populations and autoantibody production specifically from these cells [105]. A dynamic interaction within B cells has been identified where TLR7 appears to initiate and exacerbate disease while TLR9 acts as a negative regulator of disease through its regulation of TLR7 [106,107,108]. Furthermore, in SLE patients, there is an expansion of autoreactive CD27IgDCXCR5CD11c+ (DN2) B cells which can develop from naïve B cells stimulated with TLR7 and IFNγ [109]. These DN2 B cells demonstrate a predisposition to differentiate into plasma cells and are hypersensitive to TLR7 signaling [109]. Additional studies have implicated the importance of the TLR7/TLR9 paradigm in dendritic cells as being involved in the development of SLE [110, 111]. rs3853839, a SNP occurring within the 3′UTR of TLR7, results in increased TLR7 mRNA with an enhanced effect on males within a Chinese and Japanese population [68]. Furthermore, rs3853839 can be found consistently expressed within European Americans, African Americans, and Amerindian/Hispanic populations similarly resulting in enhanced TLR7 expression potentially mediated through reduced posttranscriptional regulation by miR-3148 [69]. While overexpression of TLR7 promotes disease, it would be interesting to explore if any TLR7 associated variants affect the TLR9-mediated negative regulation.

IRAK1

Two nonsynonymous SNPs (S196F and L532S) have been identified in interleukin receptor associated kinase 1 (IRAK1) within Caucasian and Japanese SLE patients [70]. Expression of mutants containing these variants into HEK293 cells results in increased stimulation of NFκB transcriptional activity and an increased sensitivity to autophosphorylation compared with wild-type IRAK1 [70]. While this study validates a role for IRAK1 variants in promoting inflammation, given the breadth of IRAK1 expression and its involvement in TLR signaling, further mechanistic investigation is warranted. In Sle1 or Sle3 autoimmune-prone mice, a global IRAK1 deficiency reduced serum IgM and IgG autoantibodies and kidney pathology. Additionally, IRAK1-deficient mice demonstrate reduced numbers of splenic CD4+ T cells, B cells, and reduced CD80 expression on macrophages and mDCs [112]. Regulation of TLR signaling could have direct implications in both the development and treatment of SLE; therefore, further study into the role of IRAK1 variants is necessary.

NCF2

A single variant within exon 12 of neutrophil cytosolic factor 2 (NCF2) is strongly associated with both adult- and child-onset SLE [54]. This SNP produces a missense mutation (H389Q) which when expressed into the human myeloid cell line K562 leads to reduced reactive oxygen species (ROS) production [54]. It is postulated that due to ROS’s ability to negatively regulate B and T cell responses, a reduction in ROS could lead to autoimmune disease [113]. A related missense variant in NCF1 results in reduced expression and is also associated with reduced ROS production in SLE patients [114]. Further, increased ROS production was found to be associated with protection against SLE in correlation studies [114]. This opens an interesting paradigm where the failed regulation of autoreactive T and B cells is, at least in part, due to a reduction in a negative feedback generated by myeloid cells.

BLK-C8orf13

SNP rs1327713 maps to an intergenic region between BLK and C8orf13 which are transcribed in opposite directions. Examination of rs13277113 in transformed B cell lines revealed reduced BLK mRNA and increased C8orf13 mRNA expression, indicating a preferential effect of the variant [71]. Trans-population mapping has identified two possibly causal SNPs within the proximal BLK promoter and the upstream alternative BLK promoter (rs922483 and rs1382568, respectively) in a mixed ancestry population [72]. These variants may modulate transcription of BLK within immature B cell lines, but not mature B cell lines or T cell lines, based on a luciferase reporter assay [72]. More detailed analysis of transcription factor binding was not performed [72]. A B cell restricted phenotype is supported by the fact that patients homozygous for SNP rs922483 risk allele have reduced BLK protein only in naïve and transitional B cell populations, but not γδ+/− T cells, or CD3CD19 cells measured from umbilical cord blood [115]. Of note, these protein differences were not identified from adult patient samples and all samples came from healthy patients [115]. In contrast, in a mouse model generated by Blk+/− crossed with the B6.MRL-Faslpr, Blk was expressed in B cells, γδ T cells, double-negative αβ T cells, and pDCs [116]. This model also resulted in increased kidney pathology, lung and liver infiltrates, and IgM and IgG levels [116]. These data indicate the requirement of further examination of the role of Blk in regulating immune cell selection [116].

Transgenic, Knock-Out, and Knock-In Mice

Mouse models of induced or spontaneous SLE-like disease have thus far been the mainstay used in our understanding of SLE and will undoubtedly be vital in the characterization of GWAS candidates. Despite the parallels drawn by many mouse models, no single model perfectly mirrors the spectrum of disease associated with SLE [117]. Various spontaneous or induced models of SLE have been reviewed recently [117,118,119]. Pairing these models with the current genetic approaches to modify a mouse genome using LoxP/Cre or CRISPR/Cas9, has allowed researchers to dissect the cell-intrinsic or extrinsic mechanisms of GWAS candidates in an in vivo system. While a complete gene knockout through the CRISPR/Cas9 or the LoxP/Cre system may allow for the interrogation of the cell-intrinsic role of a gene in developing SLE, the relative effects of a SNP on SLE phenotypes would remain largely undetermined. Therefore, a better approach to explore the role of GWAS variants would be to employ techniques that allow for site-specific mutagenesis such as CRISPR/Cas9, TALENs, or zinc finger endonucleases. CRISPR/Cas9 utilizes a bacterial derived system to selectively introduce DNA double-stranded breaks (DSBs), which are then repaired through nonhomologous end-joining or homologous repair [120]. CRISPR/Cas9 allows for efficient introductions of insertions or deletions, loxP sites, conditionally controlled elements, etc., making it an extremely powerful tool [120]. To introduce SNPs or short strings of mutations, DNA oligonucleotides or double-stranded DNA targeting constructs can be used to promote homologous directed recombination (HDR) (Fig. 3) [121•]. HDR, while a more precise DNA repair mechanism, is dominantly limited to dividing cells but can be promoted through use of Cas9 nickase or other methods [121•]. The recent development of CRISPR/Cas13 exploits the activity of adenosine deaminase acting on RNA type 2 (ADAR2) to induce site-specific mutations in RNA transcripts. This technique has been purposed to be used as a future therapeutic for the treatment of SLE and can also aid in exploring the causation of specific mutations in the development of disease in those limited variants found within the open reading frame [126]. Expression vectors such as retroviruses or lentiviruses may be useful for exploring the function of mutations, but these studies are typically limited to variants occurring within the open reading frame. With the recent advances in site-directed mutagenesis in the development of mice and their capability to be used to explore direct mechanisms, mouse models will likely continue to be widely used.

Fig. 3
figure 3

Development of CRISPR/Cas9 knock-in hiPSCs and differentiated cells. (Left) Schematic depiction of using CRISPR/Cas9 to introduce a SNP through homology directed repair [121•]. (Right) Differentiation of B cells, T cells, dendritic cells, macrophages, and NK cells from hiPSCs and the cytokine cocktails required [122,123,124,125]

Humanized Mouse Models

Humanized mice are a hybrid of an in vivo mouse model with a xenograft human sample [127]. Humanized mouse models have struggled to recapitulate human diseases and have previously required large numbers of cells transferred into host mice, but are being continuously optimized [128]. Humanized mouse models involve the transfer of human tissues or cells into a severely immunodeficient mouse allowing for reconstitution of the particular cell population of interest derived from the patient sample [127]. Current models for the study of SLE in humanized mice utilize the transfer of CD34+ hematopoietic stem cells (HSCs) into NOD SCID gamma (NSG) mice or Rag2−/-IL2Rgc−/− mice, which lack T, B, and NK cells with or without sublethal irradiation [128,129,130]. High numbers of SLE donor HSCs or PBMCs are usually injected intraperitoneally into pups or adults resulting in mixed repopulations. Mice develop autoantibodies of patient cell origin and glomerulonephritis [128,129,130]. Depending on the model used, there is a 50% reduction in surviving mice by 4–5 weeks post-transfer, limiting the number of mice which can be used for downstream applications [128,129,130]. A recent study paired the transfer of low numbers of HSCs (1 × 105) with pristane injection, a commonly used method of induced SLE mouse models [128]. Interestingly, this model developed SLE-like disease with lymphopenia, a characteristic which often occurs within SLE patients but is rarely captured in current mouse models [128].

Somewhat counter-intuitively, the activity of myeloid derived suppressor cells (MDSCs) in SLE has proven to be detrimental to patients [131]. Increased frequencies and numbers of MDSCs positively correlate with disease severity [132]. Within a humanized mouse model, the depletion of MDSC populations prior to PBMC transfer reduces SLE disease in mice through a reduction in arginase-1 production, thus limiting the differentiation of Th17 CD4+ T cells [132]. A limitation of current models is inadequate depletion of murine myeloid-derived cells paired with insufficient expansion of these populations from the human donors which is being addressed by transgenic models overexpressing human IL-2, SCF, and GM-CSF [133]. To date, most studies have focused on the establishment of SLE humanized mouse models, but the potential for mechanistic studies and validation of GWAS identified variants, based on their uses in other fields exists [117, 128, 129, 134].

Primary Human Samples, Human iPSCs, and Cell Lines

Classically, the interrogation of primary SLE patient samples is limited to confirmatory associations of mechanisms identified within murine and cell-culture-based models. This in part is due to the availability of patient samples, variability in patient samples, tissue compartmentalization of immune cell subsets, and the difficulties associated with manipulating primary human samples [135, 136]. As we have described above, many of the studies verifying SLE variants in patient samples utilize expression data from total PBMCs and correlate this information to various metrics of disease (interferon levels, autoantibodies, etc.). Determining the effect of a SLE risk variant by comparing patient samples carrying the risk allele with a healthy individual poses a significant challenge in the interpretation of the disease-related phenotypes because of the genetic background differences between the donors. Another caveat in this approach is that often several risk variants are in the same LD block and thus difficult to definitively separate the effect of one risk variant from the others. Furthermore, examination of the ramifications of a variant within cell lines has obvious drawbacks depending on the methods of immortalization used and any mutations in the cell line’s genome, particularly in those studies in which nonimmune cells are used.

Given the limitations of the techniques above and their potential lack of translation to human disease, new models are emerging to fill this gap. One such model utilizes the differentiation of human-induced pluripotent stem cells (hiPSCs) into various immune cell types to directly assess the role of a given protein or variant [137]. This state-of-the-art technique is being implemented in many fields given its plasticity and translation to human disease [137]. These models are particularly valid when a specific variant or protein may play a role in cell differentiation [138]. Currently, hiPSCs are commercially available which allows for the selection of specific populations (sex, age, ethnicity, etc.) which can aid in the delineation of population biased variants [139, 140•].

hiPSCs are derived from PBMCs or fibroblasts and re-programmed through the expression of specific transcription factors known as Yamanaka factors (Oct4, Sox2, Klf4, and c-myc) via lentiviral or recently, other nonviral/nonintegrating methods [141, 142]. Many studies have emerged outlining the procedures for the differentiation of hiPSCs into a specific immune cell of interest. Specifically, much work has focused on T cells, DCs, macrophages, and NK cells while B cell protocols have been much more difficult to develop [122,123,124,125]. Differentiation generally relies on specific cytokine cocktail which promotes the lineage of one immune cell type while limiting others (Fig. 3) [122,123,124,125]. While the focus of many investigations is to generate hiPSC-derived cells for potential therapeutic use, the same cells can be manipulated to characterize mechanisms within a controlled environment [138]. Importantly, these differentiated cells can be subjected to various relevant environmental stimuli such as TLR ligands, cytokines, ROS, etc. for dissection of the individual effects of a variant in response to that stimuli [138]. Currently, hiPSCs generated from the peripheral B or T lymphocytes of patient samples can be manipulated and then re-differentiated. Importantly, these cells retain their original antigen specificity [143, 144]. Furthermore, phenotypes observed from healthy donor iPSCs can then be further validated and explored in iPSCs derived from SLE patient cells particularly those patients bearing the variant in question [142].

While cell culture-based techniques generally are limited to the cells in the dish, thus eliminating the dynamic states of a host’s immune cell interactions, the ease in which mutations can be introduced through CRISPR/Cas9 engineering and the ability to directly access the function of these mutations within human cells is quite valuable (Fig. 3) [138, 145]. It is therefore likely that in addition to murine studies, the implementation of hiPSC models will allow for the identification of causal SLE risk variants under a controlled isogenic genetic background as compared to disease modeling will be critical in SLE variant validation and in our understanding of SLE disease mechanisms.

Conclusions

Given the breadth of clinical manifestations noted in SLE patients, it is not likely that mutations in a single gene are responsible for the break in tolerance and the development of disease. This lends to the complexity of both the disease and methods required to study it. Recently, a nonconsecutive model in which one-or-more driver variants stimulates disease and other SNPs merely exacerbate or enhance the effects of the primary loci resulting in SLE pathogenesis [11•, 17••]. This is supported by multiple patients with elevated autoantibody titers, but no concurrent disease [146]. A remaining challenge in SLE and many other complex disorders is prioritizing and characterizing variants within introns or intergenic regions and their roles in the development. The implementation of techniques such as CRISPR/Cas9 and hiPSCs will be critical in understanding cell-intrinsic screening and validation of GWAS variants. Importantly, caution must be taken when designing which parameters to assess when validating SNPs. Given that SLE is a complex trait, mutations in a single gene may not recapitulate disease in its entirety, but rather alter certain parameters.