Introduction

Periprosthetic joint infection (PJI) remains one of the most challenging complications of total joint arthroplasty (TJA), with an incidence ranging between 0.5 and 2.0% [1, 2]. Since the symptoms of PJI are often nonspecific and there are no gold standard thresholds or criteria for currently available laboratory tests, PJI is difficult to diagnose precisely and quickly, thus hindering the effective management of PJI [3, 4]. For example, serum biomarkers such as erythrocyte sedimentation rate (ESR), C-reactive protein (CRP) and white blood cell (WBC) count are not sufficiently specific to diagnose PJI [5, 6]. Although synovial fluid biomarkers such as synovial fluid WBC count and differential count may help in the diagnosis of PJI, these biomarkers are normally increased by associated edema in the extremity and/or expected inflammation around the surgical site in the early post-operative period [7]. Furthermore, previous studies evaluating the accuracy of microbiologic cultures for detecting PJI found that sample contamination may result in false-positive results. Moreover, microbiologic cultures may yield false-negative results, especially if patients have been on antibiotics prior to sampling [8, 9]. Therefore, diagnostic markers or panels of markers with the desirable properties of high sensitivity, high specificity, and clinical meaning are needed to realize rapid diagnosis and differentiation in various clinical settings. Recent follow-up studies focusing on the diagnostic accuracy of novel biomarkers suggested that both interleukin-6 (IL-6) and procalcitonin (PCT) were useful biomarkers. These two biomarkers could be assessed rapidly and had high sensitivity and specificity [10, 11], but it is still unknown whether they are clinically useful in ruling out PJI. To date, no published systematic review has compared the diagnostic performances of IL-6 and PCT. Therefore, in this study we compared the diagnostic performance of IL-6 versus that of PCT for the diagnosis of PJI.

Materials and methods

Data and literature sources

This study was conducted according to the guidance of the Cochrane Review Methods. Multiple comprehensive databases, including MEDLINE (January 1, 1976 to February 28, 2017), EMBASE (January 1, 1985 to February 28, 2017), Web of Science (January 1, 1980 to February 28, 2017), SCOPUS (January 1, 1980 to February 28, 2017), and the Cochrane Library (January 1, 1987 to February 28, 2017) were searched for studies that evaluated the diagnostic value of IL-6 or PCT for PJI diagnosis. There were no restrictions on language. Search terms were used in the title, abstract, MeSH, and keyword fields (‘prosthesis-related infections’ [Mesh] OR ‘arthroplasty, replacement’ [Mesh] OR ‘joint prosthesis’ [Mesh] OR ‘periprosthetic joint infection [tiab] ‘prosthetic infection’ [tiab] OR ‘arthroplasty’ [tiab]) AND ‘interleukin-6′ [tiab] OR ‘IL-6′ [tiab] OR ‘interleukin-6′ [Mesh] OR ‘inflammatory markers’ [tiab] OR ‘procalcitonin’ [tiab]). After the initial electronic search, relevant articles and their bibliographies were searched manually.

Study selection

Two reviewers independently selected relevant studies for full review by searching through titles and abstracts. A full text copy of each article was reviewed if the abstract did not provide enough data to make a decision. Studies were included in the meta-analysis if they: (1) assessed IL-6 or PCT as an index test to evaluate the presence of PJI; (2) had the reference standard of PJI defined as a communicating sinus tract with the prosthesis, a positive microbiological culture, or histopathological findings; and (3) fully reported cases in absolute numbers of true-positive, false-positive, false-negative, and true-negative outcomes.

Data extraction and quality assessment

Two reviewers independently recorded data from each study using a predefined data extraction form and resolved any differences by discussion. Data entry by these two reviewers was checked by a third investigator. Variables recorded were: (1) type of inflammatory marker (IL-6 and/or PCT) and sample size; (2) sensitivity, specificity, positive likelihood ratio (LR), negative LR, diagnostic odds ratio (DOR), area under the curve (AUC), location of arthroplasty, and diagnostic test conducted; (3) and cutoff value. Regarding (3), if a study presented several cutoff values for an index test, the data with the best estimates were extracted. If a study presented different cutoff values of an index test for serum versus synovial fluid, data from different biological fluids were analyzed as separate studies. Two reviewers independently assessed methodological quality based on quality assessment of diagnostic accuracy studies (QUADAS-2) tool [12]. Two reviewers resolved all differences by discussion; their consensus was checked by a third investigator.

Data synthesis and analysis

Measurements of diagnostic performance, such as sensitivity, specificity, positive LR, negative LR, DOR, and AUC, are reported as point estimates with 95% CIs for the diagnosis of PJI. The positive LR, negative LR, and post-test probability were calculated to evaluate the clinical utility of IL-6 and PCT using the summary estimates of sensitivity and specificity. The positive LR is the ratio of the probability of a positive test result if the subject has PJI to the probability of a positive test result if the subject does not have PJI. Likewise, the negative LR is the ratio of the probability of a negative test result if the subject has PJI to the probability of a negative test result if the subject does not have PJI. Positive LR values greater than 5 and negative LR values less than 0.2 indicate strong diagnostic evidence for ruling in/ruling out diagnoses [13]. To describe the Fagan plot results, the pre-test probability and post-test probability were connected by a straight line crossing the likelihood ratio. The DOR was calculated to evaluate the diagnostic effectiveness of IL-6 and PCT, with higher values suggesting better discriminatory test performance [14]. An AUC value greater than 0.8 indicates a good diagnostic ability [2]. The bivariate random-effects model was used to incorporate the correlation that might exist between sensitivity and specificity, resulting from the use of different thresholds across studies. In addition, we used the model to create hierarchical summary receiver operating characteristic (HSROC) curves. All statistical analyses were performed with Stata version 14.2 static software; the metandi and midas commands were utilized for all analyses. Publication bias was assessed by using the effective sample size funnel plot and associated regression test of asymmetry described by Deeks et al. [15]. Heterogeneity was determined by estimating the proportion of between-study inconsistencies due to actual differences between studies, rather than differences due to random error or chance, using the I2 statistic. Values of 25%, 50%, and 75% were considered to indicate low, moderate, and high heterogeneity, respectively. When statistical heterogeneity was substantial, we conducted meta-regression to identify potential sources of bias, such as the location, diagnostic standard, cutoff value, and location where samples were obtained. The number, age, and sex of the study subjects were also considered. Subgroup analyses were performed for studies evaluating both IL-6 and PCT in an attempt to explore potential sources of heterogeneity.

Results

Identification of studies

The details of study identification, inclusion, and exclusion are summarized in Fig. 1. An electronic search yielded 63 studies of interest in PubMed (MEDLINE), 75 in EMBASE, 80 in Web of Science, 54 in SCOPUS, and eight in the Cochrane Library. Three additional publications were identified through manual searching. After removing 132 duplicates, 150 studies remained; of these, 109 were excluded after reading the abstracts and full-text articles, and an additional 23 studies were excluded because they had unusable information. This process eventually resulted in 18 studies that were included in the final meta-analysis [2, 7, 10, 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30].

Fig. 1
figure 1

Preferred reporting items for systemic reviews and meta-analyses (PRISMA) flow diagram of literature selection

Study characteristics and patient samples

The 18 studies we examined included 1,327 subjects in whom the diagnostic accuracy of IL-6 for PJI was examined and six studies included 508 subjects in whom the diagnostic accuracy of PCT for PJI was examined. For IL-6, 16 studies included patients who had hip or knee arthroplasty and two included patients who had shoulder or elbow arthroplasty. The mean patient age ranged from 58 to 72 years, and 30 to 53% of the patients were male. For PCT, five studies included patients who had hip or knee arthroplasty and one included patients who had shoulder arthroplasty. The mean patient age ranged from 64 to 73 years, and 34 to 43% of the patients were male. Two studies reported on IL-6 using different cutoff values for serum and synovial fluid. The clinical and methodological characteristics, as well as the main results of each study, are summarized in Tables 1 and 2.

Table 1 Summary of study characteristics
Table 2 Diagnostic performance of screening tests for the detection of IL-6 and PCT in patients with PJI

Quality and publication biases of the included studies

The methodological quality of the included studies was evaluated using the QUADAS-2 tool [12]. This tool includes four domains that assess patient selection, index tests, reference standards, and flow and timing. Overall, the quality of the studies was deemed satisfactory (Table 3). The funnel plots and regression tests indicated no significant publication bias (P = 0.11 and P = 0.75 for IL-6 and PCT, respectively) (Fig. 2a, b).

Table 3 QUADAS-2 evaluation of included studies
Fig. 2
figure 2

Funnel plot for publication bias assessment of included studies. a Interleukin-6 and b procalcitonin, for the detection of PJI. ESS effective sample size

Diagnostic accuracy of interleukin-6 for PJI

The pooled sensitivity across studies for IL-6 was 0.83 (95% CI, 0.74–0.89), the pooled specificity was 0.91 (95% CI, 0.84–0.95), the pooled positive LR was 9.3 (95% CI, 5.3–16.2), the pooled negative LR 0.19 (95% CI, 0.12–0.29), and the pooled DOR was 49 (95% CI, 24–103) (Table 4). The positive LR was sufficiently high to qualify IL-6 testing as a rule-in diagnostic tool. Similarly, the negative LR was sufficiently low to qualify IL-6 testing as a rule-out diagnostic tool. The I2 statistics for sensitivity and specificity were 77% (95% CI, 66–87%) and 79% (95% CI, 85–93%), indicating that there was substantial heterogeneity (Fig. 3). The HSROC curve for the index test indicated that the AUC was 0.93 (95% CI, 0.91–0.95) for IL-6 (Fig. 4). The Fagan plot showed that a positive result on the IL-6 test increased the probability of PJI from 38 to 85% and a negative result in the IL-6 test decreased the probability of PJI to 10% (Fig. 5).

Table 4 Subgroup analysis for the diagnostic performance of screening tests according to specific study designs
Fig. 3
figure 3

Paired forest plots of the sensitivity and specificity of interleukin-6 for the detection of PJI

Fig. 4
figure 4

Hierarchical summary receiver operating characteristic curves of interleukin-6 for the detection of PJI. AUC area under the curve, SENS sensitivity, SPEC specificity, SROC summary receiver operating characteristic

Fig. 5
figure 5

Pre-test probabilitis and likelihood ratios for IL-6. LR likelihood ratio

Diagnostic accuracy of procalcitonin for PJI

The pooled sensitivity across studies for PCT was 0.58 (95% CI, 0.31–0.81), the pooled specificity was 0.95 (95% CI, 0.63–1.00), the pooled positive LR was 12.4 (95% CI, 1.7–89.8), the pooled negative LR was 0.44 (95% CI, 0.25–0.78), and the pooled DOR was 28 (95% CI, 6–143) (Table 4). The positive LR was sufficiently high to qualify PCT testing as a rule-in diagnostic tool, whereas the relatively high negative LR (greater than 0.2) was not sufficiently low to qualify PCT testing as a rule-out diagnostic tool. The I2 statistics for sensitivity and specificity were 93% (95% CI, 88– 97%) and 97% (95% CI, 95–98%), indicating that there was substantial heterogeneity (Fig. 6). The HSROC curve for the index test indicated that the AUC was 0.83 (95% CI, 0.79–0.86) for PCT (Fig. 7). The Fagan plot showed that a positive result on the PCT test increased the probability of PJI from 44 to 91% and a negative result in the PCT test decreased the probability of PJI to 26% (Fig. 8).

Fig. 6
figure 6

Paired forest plots of the sensitivity and specificity of procalcitonin for the detection of PJI

Fig. 7
figure 7

Hierarchical summary receiver operating characteristic curves of procalcitonin for the detection of PJI. AUC area under the curve, SENS sensitivity, SPEC specificity, SROC summary receiver operating characteristic

Fig. 8
figure 8

Pre-test probabilitis and likelihood ratios for procalcitonin. LR likelihood ratio

Meta-regression and subgroup analysis

Between-study heterogeneity was shown for sensitivity and specificity among studies of both index tests. Thus, univariate meta-regression analysis was performed to identify potential sources of heterogeneity (Table 5). For studies evaluating IL-6, the location where the samples were obtained was the most probable source of heterogeneity (P < 0.01). The location of study publication (P = 0.03) and the age of the study subjects (P = 0.01) also explained some of this heterogeneity. For both index tests, neither the diagnostic standard nor the cutoff values were the main source of heterogeneity. In addition, we performed subgroup analysis for the four studies that evaluated both IL-6 and PCT in the same patients (Table 4). When analyzing only these four studies, the negative LRs of IL-6 and PCT increased to 0.22 (95% CI 0.14–0.35) and 0.53 (95% CI 0.28–1.03), respectively, indicating that PJI is unlikely when both IL-6 and PCT are evaluated in a single patient. Pooled results from these studies showed a greatly reduced specificity for IL-6, 0.77 (95% CI 0.62–0.88), and a sensitivity for PCT of 0.48 (95% CI 0.17–0.80).

Table 5 Univariate meta-regression analysis for identifying potential sources of heterogeneity in the diagnostic performance of screening tests

Discussion

The main finding of the present meta-analysis is that IL-6 has a higher diagnostic value than PCT for distinguishing PJI from other causes of failure. Specifically, the AUC value was 0.93 for IL-6 and 0.83 for PCT. Both biomarkers had an optimal positive LR that was sufficiently high to qualify testing as a rule-in diagnostic tool. Conversely, the PCT had a suboptimal negative LR, making it insufficient to function as a rule-out biomarker.

IL-6 is released by monocytes in response to a local infection and is the main stimulator of CRP production in liver cells. Thus, the IL-6 response to infection is much more rapid than that of CRP and the IL-6 levels quickly return to normal after surgery [31]. Previous studies have shown that IL-6 levels are not increased in patients with aseptic loosening [31], whereas high concentrations of IL-6 have been detected in interface tissue from patients with loosening of prosthesis, but no infection [32]. Indeed, our findings from subgroup analysis of studies evaluating both IL-6 and PCT in the same patients suggested that IL-6 has much lower specificity (77%) compared with the findings from overall analysis (91%). These results may be attributable to the fact that IL-6 levels in peripheral blood appear to be elevated in patients with aseptic loosening of total hip arthroplasty, because monocytes respond to polyethylene particles by producing IL-6 [20]. Another factor that could explain these results are differences in patients such as those with chronic inflammatory disease and those with Paget disease or immunodeficiency syndromes. Much more IL-6 can be detected in some of these diseases, which may be responsible for the limited accuracy and subsequent reduction of specificity of IL-6 [26].

Although both biomarkers are readily available and can be serially monitored, synovial fluid biomarkers should theoretically lead to more reliable and accurate diagnosis of PJI compared with serum biomarkers because synovial fluid biomarkers are obtained directly from the affected joint [33]. One study that evaluated the sensitivity, specificity, and accuracy of 24 synovial fluid biomarkers in patients with PJI versus aseptic disease reported that synovial fluid IL-6 had excellent diagnostic performance, with accuracy above 0.9 for the diagnosis of hip and knee PJI [7]. This finding corresponds well with the results of a recent study reporting that synovial fluid IL-6 had an accuracy of 0.89 with very high sensitivity, subsequently leading to strong diagnostic strength [24]. In contrast, another study investigating the serum and synovial PCT levels in 42 patients with arthritis found that serum PCT was the best biomarker to discriminate patients with septic arthritis from patients with non-septic arthritis, whereas synovial PCT was not helpful for distinguishing between infectious and non-infectious arthritis [34]. Considering the possible influence of the location where the samples were obtained from the study subjects on diagnostic performance, we further evaluated this issue by meta-regression analyses. For IL-6, whether samples were collected from synovial fluid or not appeared to be the most probable source of heterogeneity, whereas this was not the case for PCT. As expected, this discrepancy was likely due to the lower number of synovial fluid studies in the PCT group compared to the IL-6 group. Intriguingly, the results of this meta-analysis did not support previous findings, in that suboptimal sensitivity (58%) and a negative LR (0.44) was found for the six PCT studies, including five that used serum and one that used synovial fluid for diagnosing PJI. The poor outcomes for PCT in detecting PJI may be explained by the fact that PCT is a more accurate marker for systemic bacterial infection and an inaccurately low PCT level is seen during localized infections. These findings suggest that PCT should not be utilized as a rule-out diagnostic tool in patients with localized infections [20, 23].

Investigating the sources of heterogeneity is key for determining whether our conclusions can be applied across different studies. In the diagnostic studies, the threshold effect was regarded as a major cause of heterogeneity. The I2 results of both IL-6 and PCT suggested that the current meta-analysis had remarkable heterogeneity. The calculated Spearman correlation coefficient for IL-6 was 0.191 (P = 0.447), which does not support that heterogeneity resulted from the threshold effect, whereas the coefficient for PCT was 0.943 (P = 0.005). In addition, some possible causes of heterogeneity, including study location, number of patients, age, sex, diagnostic standard, cut-off value, and location where samples were obtained were explored by meta-regression. Unfortunately, we did not identify the source of PCT heterogeneity in this meta-analysis. However, we did identify study location (P = 0.03), age (P = 0.01), and location where samples were obtained (P < 0.01) as sources of heterogeneity for IL-6. Thus, the heterogeneity of IL-6 in the included studies was likely caused by these three factors.

This study has several limitations. First, half of the studies had a small sample size (< 70 subjects), which could have led to overestimation of the diagnostic accuracy of both IL-6 and PCT for detecting PJI. However, our meta-regression analysis showed that sample size did not have much impact on the study outcomes. Second, positive and negative LRs were calculated from binary data. The results of both biomarkers were either positive or negative, meaning that useful information could have been lost because their concentrations increase as disease severity increases [35]. To obtain more precise information regarding test reliability, it will be necessary to calculate LRs based on multiple cutoffs. Finally, the IL-6 and PCT detection assays were different among the studies, which could have negatively affected the assessment of diagnostic accuracy.

Conclusions

Based on the results of the present meta-analysis, IL-6 has a higher diagnostic value than PCT for the diagnosis of PJI, and the IL-6 test has higher specificity than sensitivity. Conversely, PCT is not recommended for use as a rule-out diagnostic tool.