Introduction

Systemic lupus erythematosus (SLE [OMIM 152700]) is a relatively prevalent and complex genetic disease characterized by an autoimmune response to nuclear antigens, immune complex deposition, and subsequent tissue and organ damage. SLE affects worldwide population, predominantly in women (prevalence ratio of women to men is 9 to 1) and particularly during childbearing years [1, 2]. The prevalence of the disease and severity of its manifestations vary among different populations. It affects approximately 31–70 in 100,000 people in Chinese [3]. Recently, the investigations for susceptibility genes have identified multiple genetic factors related to SLE in various ethnic groups. Complement component 4 (C4) has been observed to have a strong and consistent association with SLE.

Complement C4 (with isotypes C4A [MIM120810] and C4B [MIM 120820]) is located in the class III region of MHC on chromosome 6p21.3. C4 genes are deleted or duplicated together with the adjacent genes including RP (serine-threonine kinase), CYP21 (steroid 21-hydroxylase) and TNX (tenascin-X), which is referred to as the RCCX module. The phenotypic and genotypic diversities of complement C4 are created by the CNV of RCCX modules and show segmental duplication as part of mono-, bi-, tri-, or quadri-modular RCCX cassettes [4, 5]. The C4 gene encodes for either C4A or C4B, differing by only four amino acids at positions 1,101, 1,102, 1,105 and 1,106: PCPVLD for C4A and LSPVIH for C4B [6, 7]. Any of those C4 genes can be a long gene (C4L) or a short gene (C4S), which is distinguished by the presence (C4L/20.6 kb) or absence (C4S/14.2 kb) of the insertion of endogenous retrovirus sequence HERV-K (C4) in intron nine [7, 8]. For each human, the distributions of the copy number groups for C4A and C4B, C4L and C4S are shown in the following equation: total C4 = C4A + C4B = C4L + C4S [8, 9].

Although some studies in Asian SLE populations have explained the genetic component of C4A gene deletion in the early 1990s [1012], association studies on the contribution of C4 GCN to SLE susceptibility remained to be determined intensely. In the present study, we performed a case–control study to investigate the association of C4 GCN with SLE using quantitative real-time polymerase chain reaction (qRT/PCR) to further understand the genetic basis. In addition, we also performed a case-only analysis to explore the relationship between C4A GCN and disease clinical parameters of SLE.

Materials and methods

DNA samples

Most of our SLE patients and healthy controls were diagnosed from hospitals of Hefei and Wuhu cities of Anhui province in the middle area of China. Informed consent was obtained from all individuals. This study was approved by the ethics committee of each hospital. All study subjects were of self-reported Chinese Han, including 1,047 patients with SLE (984 women and 63 men) with the mean age of 35.12 years. SLE was diagnosed according to the criteria of the American college of rheumatology (ACR) [13]. The clinical diagnosis of all subjects was confirmed by at least two dermatologists. Additional clinical information was collected from the subjects through a full clinical examination; 1,056 controls subjects with the mean age of 34.46 years were healthy individuals without SLE, autoimmune and systemic disorders and with no family history of SLE (including first-, second- and third-degree relatives).

TaqMan-based qRT/PCR methods to determine the C4 copy number

Copy number was determined by the TaqMan-based qRT/PCR using the relative standard curve method [14, 15]. Four TaqMan assays (labeled with FAM) of Hs07226349-cn, Hs07226350-cn, Hs07226351-cn and Hs07226352-cn which discern C4A, C4B, C4S and C4L, respectively, and the control assay RNaseP for normalization (labeled with VIC) were obtained from Applied Biosystems (ABI, Foster City, CA; https://products.appliedbiosystems.com). Target assay including proportional probes and primers was co-amplified within the same tube with RNaseP (assayed on the same 384-well plate). Reactions were performed in a total volume of 10 μl, with 10 ng of template DNA, 2.5 μl of 5× Taqman Genotyping PCR master mix (P/N 4381657; Applied Biosystems, US), 0.5 μl of assay and RNaseP reference, respectively. The experiments were at least quadrupled for each sample and were run on the ABI 7900HT using standard protocol cycling conditions (95°C for 10 min, 40 cycles of 92°C for 15 s and 60°C for 1 min, 10°C hold).

Data analysis

The real-time amplification data were analyzed by sequence detection system software (SDS 2.3; ABI). Relative gene copy numbers were determined by the comparative CT method using CopyCaller (v1.0; ABI). The number of C4A and C4B genes equals the number of C4L and C4S gene forms and equals the number of total C4 in an individual. We performed chi-square test or Fisher’s exact test to determine the difference in C4A, C4B, C4L, C4S and total C4 GCNs among various groups and to compare the difference for the clinical parameters of SLE to examine the risk conferred by the C4A GCN on different subtypes. Number of the total C4 was assigned to bins of less than four, equal to four and greater than four to allow direct comparison with the previous findings [10, 11]. Correlation analysis was chosen to explore the pattern of the variables associated between C4A/C4B and C4L/C4S. Statistical analysis was performed using the Statistical Package for the Social Sciences (SPSS) version 11.5 (SPSS Inc., 2001). P-values below 0.05 were considered to be statistically significant.

Results

Of the 924 patients with SLE after quality control, 93.18% are women and 6.82% are men. Female-to-male ratio is 13.67:1. The median age of onset of the patients is 29.89 (19.02–40.72) years. There is no differences in sex distribution, age of onset and duration of disease between copy number bins of less than four, equal to four and greater than four in the SLE cases (P = 0.45, 0.62 and 0.59, respectively, Table 1).

Table 1 The demographic characteristics data of SLE cohort with copy number bins of less than four, equal to four and greater than four for total C4 genes

C4 GCNs in healthy Chinese Han

As shown in Table 2, we first determined C4 GCN in 1,007 healthy human subjects of Chinese Han after quality control from 1,056 controls. We found that the copy number of C4A varied from 0 to 6, while the copy number of C4B varied from 0 to 5, and that the copy number of total C4 varied from 2 to 8 in a diploid genome. The majority had two copies of either C4A or C4B genes, with 69.71% for C4A and 66.43% for C4B. The C4A GCN distribution was skewed toward the high copy number side (<2, 7.25; >2, 23.04%), while the C4B GCN distribution was skewed toward the low copy number side (<2, 20.16%; >2, 13.41%) (Fig. 1). Subjects with four copies of C4 genes accounted for 63.56% of all, among which 51.94% (523 of 640) had two copies of C4A and two copies of C4B (2C4A-2C4B). Individuals with a low copy number accounted for 14.10%: two copies (0.70%) and three copies (13.41%); 22.34% of individuals had a high copy number: five, six, seven and eight copies of C4 genes were 17.48, 3.28, 1.09 and 0.50%, respectively.

Table 2 Comparison of C4 gene copy number (GCN) and its isotypes in SLE patients and controls
Fig. 1
figure 1

The distribution of C4A, C4B in the 1,007 controls

C4 GCNs in SLE patients

In patients with SLE, the copy number of C4A varied from 0 to 6, while the copy number of C4B varied from 0 to 5. The majority had two copies: C4A (67.10%) and C4B (63.42%). The frequency of the patients with less than two copies of C4A was significantly higher than that of the control subjects (13.10% in cases vs. 7.25% in controls, P = 1.97 × 10−5), whereas the patients with more than two copies of C4A were less frequent than the control subjects (19.81% in cases vs. 23.04% in controls, P = 0.08). The C4A GCN distribution was slightly skewed toward the high copy number side (<2, 13.10; >2, 19.81%), while the C4B GCN distribution was slightly skewed toward the low copy number side (<2, 21.54; >2, 15.04%), showing the same trend as in the healthy controls (Fig. 2). Lower C4A GCN was a significant risk factor for SLE (P = 1.97 × 10−5; OR: 1.93; CL: 1.42–2.62; Table 2), but higher C4A GCN has not reached statistical significance (P = 0.08; OR: 0.83; CL: 0.66–1.03) for SLE. Neither lower GCN of C4B (P = 0.46; OR: 1.09; CL: 0.87–1.35) nor higher GCN of C4B (P = 0.30; OR: 1.14; CL: 0.89–1.48) was significantly associated with SLE.

Fig. 2
figure 2

The distribution of C4A, C4B in the 924 patients

The copy number of total C4 varied from 2 to 8 in a diploid genome. Four copies of C4 were the most common form in patients with SLE (62.99%). The frequency of the individuals with less than four copies of C4 was significantly higher in SLE patients than in controls (18.51% in cases vs. 14.10% in controls; P = 8.69 × 10−3; OR: 1.38; 95% CI 1.09–1.76). However, the frequency of those with more than four copies was less common in SLE patients than in controls (18.51% in cases vs. 22.34% in controls; P = 0.04; OR: 0.79; 95% CI: 0.63–0.99). In addition, there was significantly increase or decrease in the frequencies of the C4 GCN due to the variations of C4A but not C4B in SLE cases.

Correlation analysis between C4A/C4B and C4L in the Chinese Han population

For the C4L/C4S genes, another two CNV assays were performed in a subset of our samples consisting of 916 SLE cases and 574 controls, in which C4A and C4B had already been analyzed. After stringent quality control, 869 cases and 514 controls were included in our data analysis. The C4L copy number varied from 0 to 6. The C4S copy number varied from 0 to 5. C4L was correlated with C4A (r = 0.28; P = 1.0 × 10−6), while C4S was largely associated with C4B (r = 0.32; P = 1.0 × 10−6).

Association of SLE disease with C4A GCN

For C4A GCN appears to be a more important risk factor than C4B or total C4GCNs, we only performed case-only analyses to examine the risk that is conferred by C4A GCN bins (crossable with less than two, equal to two and more than two C4A GCN) within the SLE cohort. The counts and percentages of the patients with and without the clinical parameters for SLE were listed in Table 3. Table 3 also presented that the P-values (6.05 × 10−3) of analyses between the presence and absence of arthritis were significant difference.

Table 3 Comparison of C4A copy number frequency in SLE patients between clinical subgroups

Discussion

Previous association studies have suggested that the C4 gene is associated with SLE in the Caucasian populations [8, 9]; however, the association is unclear in the Chinese Han population, and the possibly functional significance remains to be determined. There is a broad hint in our results with a larger Chinese sample. Firstly, we compared the C4 GCN in the control subjects between the Han Chinese and the European Americans. We found seven copies in 12 cases and 11 controls and eight copies in 1 case and 5 controls, despite rare subjects of seven or eight copies have been reported in Caucasian populations [9, 1517]. We found that 63.56% (60.8% in European Americans) of people had four copies of C4 genes, 14.10% (12% in European Americans) had less than four copies and 22.34% (27.2% in European Americans) had more than four copies in the healthy Han Chinese, which is significantly similar to the European Americans [8]. Our results showed that 51.94% of the healthy Han Chinese had the two locus of 2C4A-2C4B model, which was also similar to the report that almost half of them have been described in the Caucasian population [8].

In contrast to the C4B GCN, there was a significant difference in the distribution of C4A GCN between SLE cases and controls, i.e., SLE patients carried more frequently low copies (0 or 1) of C4A genes than healthy controls (13.10% vs. 7.25%), but those who carried high C4A copy number (>3) were less frequent than controls (19.81% vs. 23.04%). This was different from that in the western European in the report described by Wu et al. [low copies of C4A genes (32.9% in SLE cases vs. 18.1% in controls) and high C4A copy number (15.3% in SLE cases vs. 25.9% in SLE cases)] [8]. But the C4A or C4B GCN distribution was skewed toward the same direction in SLE cases and in controls in the Han Chinese, which is also the case in populations with European ancestry [8, 9]. We concluded that the distribution of C4 GCN differed significantly between patients with SLE and controls, and lower copy number of C4 was a risk factor for SLE, whereas higher gene copy number of C4 was a protective factor against SLE susceptibility in the Chinese population, and the decrease in C4 GCN in SLE patients was due to the increase in homozygous and heterozygous deficiencies of C4A but not C4B. Our findings were consistent with the findings from European Americans [8, 9, 18, 19] but are different from the report described by Kamatani et al. [20] who reported that C4 GCN might not have a causal effect in the pathogenesis of SLE.

The copy numbers of C4L and C4S varied from 0 to 5 and from 0 to 6, respectively, which were different from those in the European populations, with 0–6 copies for C4L and 0–4 copies for C4S in SLE cases and controls [8]. In Chinese Han, the individuals with two copies of C4L (45.14%) and those of C4S (39.88%) were the most common groups in controls. But the frequency of the individuals with three copies of C4L sharply increased from 31.71 to 42.32%, and those with one copy of C4S were the most frequent (43.20%) in SLE cases. Although we found that C4L and C4S were correlated with C4A and C4B, respectively, these correlation coefficients were less than 100%, suggesting that C4L gene and C4S gene could code for either a C4A protein or a C4B protein. Our findings are consistent with the results from the European Americans [9, 16, 21, 22].

The high heterogeneity of SLE manifestations may give clues on different pathogenesis of the disease. Recently, there have been several phenotype association studies on SLE and other diseases. For example, DEFB4 genomic copy number has been shown to be associated with inflammatory bowel disease (IBD) [23]. PDCD1 has been identified as being associated with lupus nephritis and anti-phospholipid antibodies [24]. BLK is associated with SLE [25], and TNIP1, SLC15A4, ETS1, RasGRP3 and IKZF1 are associated with clinical features of SLE in a Chinese Han population [26]. These phenotype association studies indicated that SLE is complex disease with extremely heterogeneous features, and further understanding the relationships between susceptibility genes and subphenotypes of the disease might help to elucidate disease mechanisms and seek better clinical therapy.

Furthermore, stratified analysis on phenotype associations of C4A GCN in our study showed that there was significant difference between the cases with arthritis and those without arthritis, which suggested that C4A GCN might not only play important roles in the development of SLE, but also contribute to the complex clinical features of SLE. Previous studies have been proposed that C4A is a presumably recent addition or a gain of function and C4A proteins are more important in eliminating immune complexes [27, 28]. In contrast, C4B proteins were suggested as the more ancient proteins, which are relevant to the defense against microbes and propagate the complement activation cascades [27, 29, 30]. Lower C4 GCN or C4A GCN may confer different strengths of innate or adaptive immunity or different susceptibilities to SLE [8, 9, 3133]. C4 deficiency was identified responsible for juvenile idiopathic arthritis [34]. C4A deficiency was demonstrated a positive association with rheumatoid arthritis [35]. All these observations implicated that lower C4A or C4 GCN has been involved in the pathogenesis of arthrosis. Moreover, the possible explanation for the association between C4A GCNs and SLE, as well as C4A GCNs and SLE with arthritis, might be that it generally influenced the mechanisms of the clearance of immune complexes, apoptotic cell debris and infectious agents.

In conclusion, this is the first time that the distribution of C4 copy number variants has been evaluated in a large-scale sample from Chinese Han population. Our results further supported the concept that C4 gene dosage plays an important role in the pathogenesis of SLE in the Chinese populations. Exploring the relationship of genetic background of different populations could help us improve our overall understanding the pathogenesis of this disease.