Introduction

Complex diseases, such as systemic sclerosis (SSc), are defined as those involving the interaction between genetic predisposition and environmental triggers. However, the environmental factors that influence the disease onset or prognosis of SSc are unknown. Moreover, from the genetics point of view, the number of relevant genes or the extent of their involvement in the disease pathogenesis have not been established yet.

The heritability of SSc was considered controversial in the largest published SSc twin study, which in general suggested a modest genetic contribution to the phenotype [1]. Nevertheless, this study included only 42 sets of twins, and it should be considered that, in a family study of 703 cases, an affected first-degree relative increased the risk of SSc 13 times compared to the general population [1, 2]. Moreover, having an affected sibling increased SSc risk by 15 times [2], and there was a remarkable concordance of autoantibodies between SSc twins [1]. Additionally, recent analyses have shown that the standardized incidence ratio of SSc seemed to be less than those observed in autoimmune diseases (ADs) such as rheumatoid arthritis or ankylosing spondylitis, but similar to those observed for Hashimoto thyroiditis or psoriasis [3]. In addition, SSc prevalence, clinical outcomes, and autoantibody profiles have been reported to vary depending on patient ancestry [4]. Therefore, the role of genetic factors in SSc susceptibility can now be considered solidly established.

Case-controls studies, in particular those based on single-nucleotide polymorphisms (SNPs), have provided a continuously growing panel of genetic players in the SSc pathogenic process. Genome-wide association studies (GWASs), which include hundreds of thousands of SNPs located throughout the genome, have been very effective in identifying a huge number of genetic loci associated with complex traits, including autoimmune diseases (ADs) [5]. GWASs offered for the first time a hypothesis-blind approach to the analysis of complex traits. Moreover, analyses covering a great proportion of the variability of the human genome, especially in Caucasian cohorts, including large cohorts, became feasible for the first time. Unfortunately, SSc was not included in the first GWAS in ADs in 2007 [6], and until 2010, no SSc-related non-HLA locus had been established at the genome-wide significance level (p value < 5 × 10−8) [7] (Table 1). Although the initial genetic association reports in SSc were hardly reproducible, the coming of GWASs [711] and the recruitment of large patient cohorts have resulted in a growing number of firmly established SSc genetic susceptibility loci.

Table 1 Locus with or without significance level

In this review, we will provide an updated overview of the known SSc-related genetic factors and we will address their possible functional implication in the pathogenic events that are characteristic of this chronic and disabling condition.

HLA

The human leukocyte antigen (HLA) was the first genetic association with SSc to be discovered and, similar to other ADs, it remains as a major autoimmunity genetic marker [12]. HLA-class II is expressed exclusively on antigen presenting cells and presents extracellular antigens. Considering that (1) the genes encoding the three major types of HLA class II molecules (HLA-DP, HLA-DQ, and HLA-DR) have been associated with SSc, (2) the associations with SSc in the HLA region are closely related to the autoantibodies in the patient sera, and (3) the reported remarkable variability between populations, we will address a comprehensive analysis of this region in the present section.

Classical HLA alleles

HLA associations have been studied in different ethnic populations, as discussed below, and have been shown to vary according to the presence/absence of the three most common SSc-related (and mutually exclusive) autoantibodies. The HLA-DQB1*0301 allele has been associated with the entire disease group of SSc patients in whites, blacks, and Hispanics [4], while HLA-DRB1*01, DRB1*04, and DQB1*0501 alleles have been related to anticentromere positive SSc patients (ACA+) and HLA-DRB1*11 and HLA-DPB1*1301 have been associated with antitopoisomerase positive SSc patients (ATA+), independently from their ethnic origin [4, 13, 14].

In white European ancestry populations, it has been described that the HLA-DRB1*1104/HLA-DQA1*0501/HLA-DQB1*0301 haplotype and the HLA-DQB1 alleles that encode a non-leucine residue at position 26 (DQB1 26 epi) predispose to develop SSc. On the other hand, the HLA-DRB1*0701-HLA-DQA1*0201-HLA-DQB1*0202 haplotype and the HLA-DRB1*1501 haplotype have a protective effect [14]. These HLA effects are also seen in Hispanic populations [14]. Furthermore, in whites, the HLA-DRB1*04 or HLA-DRB1*08 alleles and the HLA-DQB1*05 and HLA-DQB1*26 epi alleles cause an increased risk for ACA+ SSc [14, 15]. However, ATA+ SSc is associated with an over-representation of the HLA-DRB1*11 allele, the HLA-DRB1*1104/HLA-DQA1*0501/HLA-DQB1*0301 haplotype, and the HLA-DQB1*03 and HLA-DPB1*1301 alleles, or a reduced frequency of the HLA-DQB1*0501 allele [1315]. Moreover, the HLA-DRB1*01 alleles have been described to predispose to ACA+ SSc and to protect against ATA+ SSc [15]. The anti-RNA polymerase III antibody positive SSc patient subset (ARA+) is defined by DRB1*0404, DRB1*11, and DQB1*03 in both white and Hispanic populations [14].

Regarding black populations, the HLA-DRB1*0804, HLA-DQA1*0501, and HLA-DQB1*0301 alleles are associated with the overall disease and HLA-DRB1*08 with ARA+ patients [14].

SNP-based analyses

As it was expected from candidate gene studies, GWASs confirmed the great contribution of the HLA region to the SSc genetic component. In the three GWASs carried out to date in SSc, the most prominent genetic association signal corresponded to the HLA class II region in chromosome 6 [711]. In the interrogated European descent populations, the highest peaks corresponded to the HLA-DRB1 gene (rs6457617) and the HLA-DQB1 locus (haplotype block defined by rs9275224, rs6457617, and rs9275245), respectively [7, 11].

In addition, the Immunochip, a custom platform designed to provide a dense-mapping of different autoimmune-related chromosomal regions, has allowed the link between SNPs, classical HLA-alleles, and polymorphic amino acid positions in the HLA molecules [16, 17]. This platform, with complete coverage of the known variants in the extended HLA region, offers a suitable basis for novel imputation strategies that allow the deduction of the amino acids that better tag the associations pointed out by SNP-based analyses and that define the classical HLA alleles [16]. In the case of SSc, the differential associations of the classical HLA alleles with the main SSc serological subsets (ACA+ or ATA+) were further confirmed, since it was necessary to define a different HLA model for each serological subset to explain all the HLA associations observed in the whole disease analysis [18]. The ACA+ associations were best explained by certain residues in the 13th polymorphic position in HLA-DRB1 and the 69th position in HLA-DQA1. On the other hand, the ATA+ subgroup associations were dependent on the 67th and 86th positions in HLA-DRB1 and the positions 76th and 96th in HLA-DPB1 [18]. Moreover, both the subtype and the whole disease groups required the addition of seven SNPs to the model to condition all the HLA signals [18].

Mayes et al. underlined the relevance of the identified amino acids either in peptide binding groove or in the structure of the HLA proteins [18]. Indeed, as illustrated in Fig. 1, the classical HLA alleles that included the most highly associated risk amino acids for each subtype seemed to cause different amino acid binding preferences in the corresponding peptide binding grooves [19, 20]. In addition, the implication that the majority of the reported SNPs served as expression quantitative trait loci (eQTLs), that is, that their role was to modulate the expression of multiple HLA class I and class II genes, suggested novel functional implications for these selected variants [18, 21].

Fig. 1
figure 1

a Previously reported classical four-digit alleles associated with ACA+ or ATA+ SSc patients and their relation with the amino acid model in Mayes et al. 2014. b HLA-DRB1 sequence logos for SSc-associated classical alleles

NON-HLA

In the following sections, we focus on the non-HLA genetic regions that have been associated with SSc at the genome-wide significance level and those that did not reach this threshold but that have been associated with the disease in at least two different independent studies.

Type I interferon pathway

Type I interferon (IFN) responses induce antiviral immunity. These responses are initiated by a variety of pattern recognition receptors, such as Toll-like receptors (TLRs). IFNs modulate the amount of antigen presented to T cells, induce APC maturation, and increase natural killer (NK) cell activation [22]. Moreover, the balance and timing between IFNs and other stimulation pathways on T cells control the repression or promotion of T cell antiviral responses [22]. IFNβ is produced in any kind of cell infected by a virus, while APCs, especially plasmacytoid dendritic cells (pDCs), are the key producers of IFNα [22]. In the case of SSc, pDCs have also been shown to secrete CXCL4, which leads to endothelial-cell activation and increased responses of TLRs [23]. Furthermore, SSc has been classified as an “IFN signature” AD based on multiple lines of evidence that support a deregulation of the type I IFN pathway [24]. Increased expression of IFN-regulated genes in peripheral whole blood cells, peripheral blood mononuclear cells, and sera of SSc patients have been described [24]. Regarding genetic findings, several IFN-related genes have been found to be associated with SSc.

Interferon regulatory factor 5 (IRF5) is expressed by macrophages, dendritic cells (conventional dendritic cells or cDCs) and pDCs as well as B cells. This protein promotes IFNα expression, and it is essential in defining an inflammatory macrophage lineage [25, 26]. Furthermore, IRF5 is involved in cell cycle and apoptosis, in microbial infection, and in inflammation [25, 26]. IRF5 is a common autoimmunity susceptibility factor and also one of the major non-HLA gene associations with SSc [24]. Interestingly, the risk association of this locus with SSc has been reported in different ethnicities, and it reached the genome-wide significance level [7, 18, 27, 28]. While initial reports pointed towards a subtype specific association of this locus with patients with dcSSc or lung involvement [28, 29], novel reports show an association with the whole disease [7, 18]. In this line, Carmona et al. observed an additive effect of three distinct functional haplotype blocks (the haplotype tagged by rs10488631 is involved in protein stability, rs2004640 tags a splicing-related haplotype, and rs4728142 tags a haplotype in the 3′UTR that affects the expression of the gene) [30]. The IRF5 haplotype structure was similar to the effect observed in SLE patients [30]. This genetic resemblance in the IRF5 locus between SSc, SLE, and other ADs has been confirmed recently using both the frequentist and the Bayesian approaches [31]. There is evidence that this locus affects the survival of SSc patients, which suggests that it may influence not just the susceptibility but also the severity of some SSc clinical complications such as lung involvement [32]. Remarkably, the minor allele of IRF5 rs4728142, which is the most significant risk IRF5 variant associated with SSc identified in the GWAS by Radstake et al., is associated with longer survival in SSc patients [7, 32].

IRF8 is another member of the IFN regulatory factor family that has been associated with SSc. This gene was identified as a genome-wide level SSc susceptibility factor in a phenotype-directed GWAS analysis by Gorlova et al. [33]. In this study, the minor allele of IRF8 rs11642873 was less frequent in the lcSSc subset of patients than in the control group [33]. The encoded protein has a relevant role in the innate immune response carried out by myeloid cells and DCs and is also involved in B cell biology [34].

There is evidence of the association of an additional locus of this family, IRF7, with a protective effect for ACA+ SSc [35]. It is noteworthy that IRF7 is considered the master regulator of type I IFN antiviral immunity [36]. However, this association requires replication by independent groups.

Interleukin-12 pathway

Interleukin-12 is produced by phagocytes, APCs, and B cells after infection [37]. The main consequence of IL-12 production is the secretion of IFNγ, with the consequent proinflammatory effects. IL-12 controls T cell expansion towards a Th1 phenotype to the detriment of the Th2 compartment [37]. Moreover, it has been implicated in the development of auto-reactive Th1 cells in disease [37]. A variety of IL-12 pathway genes have been associated with ADs [38]. It has been suggested that ADs can be divided into two clusters based on their association with the IL-23 receptor (IL23R, which belongs to the same cytokine family but involved in Th17 amplification) or to IL12A (which encodes the IL-12 specific subunit, IL-12p35) [38]. This second group characterized by a key role of IL-12 and/or IL-35 would encompass SSc, which has several IL-12-related loci among its major genetic associations.

In fact, the rs77583790 rare variant, located upstream the IL12A locus, showed a genome-wide level risk association with SSc, and especially with the lcSSc subset in the recent Immunochip study by Mayes et al. [18]. Moreover, the coding genes for both chains in the IL-12 receptor (IL-12R), IL12RB1 and IL12RB2, have been associated with SSc [39, 40]. Although the functional relevance of the IL12RB2-associated variants (rs3790567 was found to be the lead SNP) is unclear, the study by Bossini-Castillo et al. concluded that the observed signal was independent from the nearby IL23R gene [39]. This evidence supports the above-mentioned hypothesis of the relevance of IL-12 in SSc. In the case of IL12RB1, the results in López-Isac et al. point towards the rs436857 promoter SNP as the most plausible causal variant for the region [40]. The minor allele of rs436857 was a protective variant, and in silico analysis correlated this allele with a lower expression of IL12RB1, concordant with a decrease in SSc risk due to a lower response to IL-12 driven inflammation [40].

The STAT4 locus is another clear example of the IL-12 predominance in SSc. STAT4 is a well-known autoimmunity susceptibility genetic factor, which encodes a transcription factor that plays a central role in IL-12 triggered inflammation [41]. Additionally, type I IFN can directly activate STAT4 which induces the production of IFNγ [42]. Therefore, this molecule can act as link between the innate and the adaptive immune responses [42]. In SSc, STAT4 was soon established as an SSc risk factor by Rueda et al. and then confirmed in different independent cohorts of European and Asian origin [4347]. It should be highlighted that STAT4 reached the genome-wide significance level in the two SSc GWAS comprising white European individuals [7, 11]. Moreover, the Immunochip study helped to narrow down the associated region, but the functional basis for this association is still unknown. STAT4 knock-out mice seem to be protected from inflammation-driven fibrosis in the SSc bleomycin-induced SSc model [48]. In addition, Dieudé et al. reported an effect of the STAT4 risk variants on pulmonary fibrosis in SSc [44]. Therefore, this locus is considered as a promising therapeutic target both for SSc and other ADs [41, 49].

Debris clearance, autophagy, and detoxification

One of the most relevant advantages of carrying out well-powered GWASs or immune-focused studies, in which several loci are tested without a very stringent pre-selection criteria, is the generation of new hypothesis that propose mechanisms that may have been previously overlooked, but that contribute to the onset or progression of ADs. In this context, the recently published Immunochip study identified two new SSc genetic risk factors, DNASE1L3 and ATG5, which included for the first time debris clearance and autophagy, respectively, as pathogenic mechanisms for SSc.

DNASE1L3, homologue to DNase I, is a single- and double-stranded DNA endonuclease expressed by liver and spleen macrophages [50, 51]. Mutations in the DNASE1L3 gene have been associated with familial forms of SLE, presumably due to an impaired elimination of the detritus of apoptotic macrophages, which lead to the production of auto-antibodies and immune imbalance [51]. This hypothesis is in line with the association of DNASE1L3 rs35677470 SNP with ACA+ SSc patients [18]. The signal in this variant reached a p value = 4.25 × 10−31, and it is the most significant non-HLA association with SSc described to date [18]. It should be noted that the reported variant encodes a non-synonymous change from arginine to cysteine in the 206th position of the protein, which abolishes the activity of the protein, probably due to an alteration in its tertiary structure [52]. Moreover, the same association was replicated in an independent Immunochip study in an Australian SSc cohort, confirming the relevance of this locus in SSc [53]. Furthermore, the described association in the DNASE1L3 seemed to explain the observed association in the nearby PXK locus, identified by Martín et al. [54].

In the case of ATG5, involved in the elongation of autophagosomes, the observed association corresponds to an intronic region of unknown function, rs9373839 [18]. Autophagy has emerged as an important piece of the immune response process [55]. Autophagy-related molecules interact with the immune cells at different levels such as the following: T and B cell development and function, phagocytosis, antigen presentation, cytokine secretion, etc [55]. Consequently, the association of ATG5, with SSc introduces autophagy as a new area of research and drug target exploration for this condition.

In addition, one of the suggestive loci in Allanore et al., the PPARG (peroxisome proliferator-activated receptor gamma) gene, was confirmed in a later SSc meta-GWAS and replication by López-Isac and collaborators [11, 56]. The risk variant identified in this gene, the rs310746 SNP, did not reach the genome-wide significance level in the overall analysis, but it was confirmed in the replication phase [56]. Thus, it is possible that this gene, involved in the peroxisome detoxification system with clear implications in fatty acid metabolism, has a role in SSc. Remarkably, as pointed out by López-Isac et al., this molecule also has an antifibrotic effect and has been found to affect dermal fibrosis in the SSc bleomycin mouse model [56, 57].

T cell-associated loci

According to the evidence, the T cell compartment is a key component of SSc pathogenesis. These lymphocytes appear in fibrotic zones and show altered phenotype and numbers [58]. Therefore, it is not surprising that several genes expressed by T cells belong to the SSc genetic network.

In fact, CD247, the encoding gene for the ζ-chain of the T cell receptor (TCR), was identified as a novel SSc susceptibility factor in the first GWAS in European descent SSc patients [7]. The protective association of rs2056626 observed by Radstake et al. was independently replicated in European cohorts, which confirms the implication of the TCR modulation in the disease [7, 59]. On the other hand, the same signal was not observed in Chinese individuals, underlining the high influence of ancestry and the heterogeneity between patients in SSc [60].

The tyrosine-protein kinase CSK (or C-Src kinase or C-terminal Src kinase) is an AD genetic marker that is involved in the inactivation of the Src-family kinases, which participate in signaling cascades such as the TCR pathway, B cell signaling, and skin fibrosis [61, 62]. Interestingly, the rs1378942 SNP, which maps in a CSK intron, was identified in a GWAS follow-up study by Martin et al. as a variant associated with increased SSc risk [62].

Furthermore, the lymphoid tyrosine phosphatase (LYP) encoded by PTPN22 can exert its negative regulation over the TCR activation only when it is separated from CSK [63]. Interestingly, a non-synonymous variant in PTPN22, known as C1858T, R620W, or rs2476601, has been associated with multiple ADs and prevents this protein-protein interaction [64]. Of note, a meta-analysis of several SSc cohorts showed that this variant, but not another AD-associated functional SNP (rs33996649), has an impact on SSc susceptibility [65].

B cell-associated loci

SSc is characterized by immune imbalance in which also B cells react in a pathogenic manner [66]. B cells are responsible not only for auto-antibody production, but for cytokine release that activates the immune response. BANK1 (B cell scaffold protein with ankyrin repeats 1) was the first B cell marker gene to be associated with SSc. The association of this locus, despite being modest, has been proven to be consistent in different studies [18, 67, 68]. What is more, BANK1 has been reported to have additive effects with STAT4, IRF5, and with an additional B cell marker, BLK (B lymphocyte kinase) [67, 69]. The association in this locus is again modest, but consistent in different European and Asian cohorts [18, 6972].

TNF pathway and family

Abnormal levels of TNFα in SSc patient sera, leucocytes, bronchoalveolar lavage fluid, and skin have been long reported [73]. Moreover, TNFα inhibitors have been suggested as possible treatment for SSc patients [74]. Thus, it is not surprising that several TNFα pathway genes have been reported to be associated with SSc.

TNFAIP3, also known as A20, inhibits the proinflammatory NF-κβ signaling after TNFα activation [75]. Moreover, TNFAIP3 is also involved in apoptosis, IRF activation in response to pathogens, and even in autophagy [75]. Several studies have addressed the association of this locus with SSc. Initially, the TNFAIP3 locus was found to be related to SSc and especially to its severe phenotypes (dcSSc and pulmonary involvement) [76]. Then, a peak of association in this region was replicated in the Immunochip study by Mayes et al. [18]. Furthermore, this gene has been found to be associated with polyautoimmunity in SSc patients, and it has been identified as a shared locus between SLE and SSc in pan-meta-GWAS reports including both conditions [54, 77].

TNIP1 encodes the TNFAIP3-interacting protein 1, which regulates TNFAIP3, the previously described TNF-induced NF-κβ pathway inhibitor. Allanore et al. reported for the first time the association of TNIP1 with SSc in a GWAS [11]. The association of three highly linked SNPs in TNIP1 (rs2233287, rs4958881, and rs3792783) as SSc risk factors was independently replicated by Bossini-Castillo et al., and this locus was reinforced as a genome-wide level genetic factor in the meta-analysis [78].

TNFSF4 is a costimulatory molecule of the TNF family, also known as OX40L. The binding of this ligand, expressed on activated APCs and endothelial cells, to its receptor (CD134 or OX40) promotes T cell and B cell proliferation and survival [79]. Polymorphisms in the TNFSF4 gene have been found to be associated with ADs [80], including SSc. Several SNPs located in the TNFSF4 promoter have been reported to be associated with SSc [81, 82]. However, controversial phenotype specific associations were reported in both studies. A later meta-analysis including a new cohort, confirmed an especially strong association of rs2205960 with the ACA+ subset of patients [83].

Conclusion

International collaboration has allowed the analysis of the genetic basis of SSc in powerful and reproducible studies. Furthermore, the implementation of new genotyping platforms and innovative biocomputational and statistical methods have provided the scientific community with increasing numbers of identified loci and new insights into the relationships that connect them. Genetic evidence supports a key role of the immune system in SSc predisposition, particularly with the IFN type I and IL-12 pathways and the deregulation of several immune cell compartments. Nevertheless, future approaches including even larger cohorts, deep clinical characterization and longitudinal measures of the individuals, or integrative analyses of genomic, epigenomic, transcriptomic, and proteomic data (systems biology) will help to establish the pathogenic mechanisms that result in the onset and progression of SSc. These advances would be especially valuable in understanding the most severe clinical outcomes, such as lung involvement, which remain widely unexplained. Eventually, this knowledge would lead to the validation of SSc biomarkers, the selection of drug targets, and the development of precision medicine.