Introduction

Tuberculosis (TB) remains a major global health challenge and is considered a leading cause of death worldwide. According to World Health Organization (WHO) reports, in 2012, nearly 8.6 million people developed TB and 1.3 million died from the disease [1].

It has been estimated that one-third of the world’s population are latently infected with Mycobacterium tuberculosis. Studies show that approximately 5–10 % of infected individuals will develop active TB in their lifetime and the majority of them develop TB disease [2]. Because there are several unanswered questions related to the detection and management of latent tuberculosis infection (LTBI), particular emphases as well as increased research investments are needed to be given for the development of a reliable diagnosis of LTBI. One of the main problems is the correct diagnosis of different stages of tuberculosis. The tuberculin skin test (TST) is the most often used screening method for the diagnosis of M. tuberculosis infection, but it can lead to both false-positive results in the Bacillus Calmette–Guérin (BCG)-vaccinated population and false-negative tests due to anergy or immune deficiency [3, 4]. Another test with a higher sensitivity and specificity than the traditional TST is interferon-gamma release assays (IGRAs), which can discriminate TB infection from BCG vaccination or exposure to non-tuberculous mycobacteria (NTM) by measuring IFN-γ response to M. tuberculosis-specific antigens [6-kDa early secretory antigenic target (ESAT)-6, 10-kDa culture filtrate antigen (CFP)-10, and Rv2654] [14]. The two currently commercially available IGRAs are the QuantiFERON TB test (Cellestis, a QIAGEN company) and the T-SPOT.TB test (Oxford Immunotec, Abingdon, UK). Although the TST and IGRAs are designed to detect a cellular immune response to M. tuberculosis, they do not distinguish between latent infection and active disease [59].

It has been reported that M. tuberculosis-specific T-cell response in patients with a non-replicating M. tuberculosis infection may be characterized by several cytokine secretions, including interleukin-2 (IL-2) [10].

Since the TST and IGRAs are not able to identify individuals with active tuberculosis or those with the highest risk to develop the disease, IL-2 response might be cooperative in differentiating active and latent stages of TB. There has not yet been a systematic review and meta-analysis of the data on the diagnostic potential of IL-2 as a biomarker for the diagnosis of latent tuberculosis.

We conducted a systemic review and meta-analysis on the diagnostic potential of IL-2 as a biomarker to determine if the available data support the use of IL-2 to improve the diagnosis of LTBI.

Materials and methods

Search strategy and study selection

The search strategy has been devised on PubMed, Embase, and Web of Science, and then adapted for the other databases. No beginning data limit was used. The databases were searched up to 30 December 2013. The bibliographies of selected articles were screened for potentially suitable references, which were then retrieved. All references were imported into an EndNote library and tagged with the name of the database. Conference abstracts and proceedings, as well as additional references, were added through searching the references cited by the identified studies.

Our search included the terms: “latent tuberculosis AND cytokine” OR “latent tuberculosis AND interleukin 2“ OR ”Mycobacterium tuberculosis AND cytokine” OR “Mycobacterium tuberculosis AND interleukin 2”.

All titles and abstracts were assessed for inclusion according to the following agreed criteria.

Inclusion criteria

Case definitions

Active TB

Defined as microbiologically confirmed if a sputum smear was positive for acid-fast bacilli on smear microscopy by the Ziehl–Neelsen method and/or M. tuberculosis was identified in sputum culture.

LTBI

Patients with TST ≥10 mm and with exposure to TB disease or originating from a highly TB endemic country but without radiological evidence of ongoing or previous active TB in the clinical work-up. The latent TB group included subjects with either a positive TST or positive IGRAs.

No TB

The control group included subjects who tested negative with the TST and/or QuantiFERON-TB® Gold In-Tube/ELISpot assays and/or had no clinical or radiographic evidence of active TB disease, no exposure to TB, and no history of earlier active TB.

Control group

Those with no clinical or radiographic evidence of active TB and no known history of exposure to TB were and/or had negative TST/IGRAs.

Study design

The studies of diagnostic tests must have sufficient data to calculate the sensitivity, specificity, and negative and positive predictive values. If necessary, we contacted the authors for pertinent information that was not included in the original publications.

Study selection and data extraction

Two reviewers independently screened all titles identified in the database searches. The full text of all articles included by either reviewer on the basis of the abstract was obtained. To determine eligibility for inclusion, one author reviewed all full-text articles. A second author repeated this assessment independently for a random selection of 10 % of full-text articles and there was complete agreement regarding the excluded articles. Two reviewers independently extracted data from all of the included studies. One author extracted all data from all studies and the other author independently re-extracted data from all of the included studies between them. Disagreements were resolved by consensus. Full-text copies of potentially relevant studies were obtained and their eligibility for inclusion was independently assessed. Studies that do not fulfill all of the inclusion criteria were excluded.

Quality assessment

The quality of the studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS, maximum score of 14) tool [11]. Each study was independently assessed by two reviewers. Any discrepancies were resolved by discussion or, if agreement could not be reached, by arbitration by a third reviewer.

Exclusion criteria for meta-analysis

The following exclusion criteria were used for further selection of articles for meta-analysis: (1) if the numbers of true-positives, false-negatives, true-negatives, and false-positives with a cut-off point of IL-2 were not available or could not be derived from the study; (2) studies that assessed the ELISpot assay; (3) studies which reported only the median (LQ–UQ) of the IL-2 level.

Data analysis and synthesis

We used standard methods recommended for meta-analyses of diagnostic test evaluations. The following measures of test accuracy were computed for each study: sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR). The relative DOR (RDOR) was calculated according to standard methods to analyze the change in diagnostic precision in the study per unit increase in the covariate [12].

If the 2 × 2 data were not available, attempts were made to derive them from reported summary statistics, such as the sensitivity, specificity, and/or likelihood ratios.

Preliminary exploratory analyses were conducted for each test by plotting estimates of the sensitivity and specificity from each study on forest plots and in the receiver operating characteristic (ROC) space. The area under the curve (AUC), the corresponding 95 % confidence interval (CI), and the p-value were calculated.

The sensitivity and specificity for the single test threshold was used to plot a summary receiver operating characteristic (SROC) curve [13].

We adopted the following overall approach for the evaluation of heterogeneity in the results expected between studies of diagnostic tests. We assessed the heterogeneity using forest plots and then statistically tested for significance using the Chi-squared test and the I-squared statistic (I2 > 50 % indicates significant heterogeneity) [14]. Consequently, we calculated pooled estimates using random effects modeling, which provides more conservative estimates than fixed effects modeling when heterogeneity was present [15].

Publication bias is considered as a concern for meta-analyses of diagnostic studies [16]. Deeks et al. indicated that tests which use standard errors of odds ratios are likely to be seriously misleading in meta-analyses of test accuracy. Therefore, in our study, the effective sample size funnel plot and associated regression test of asymmetry was used to detect publication bias [16]. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist was used to guide the reporting of the systematic review. We performed all analyses using STATA 11 (StataCorp, College Station, TX, USA) and Meta-DiSc software (version 1.4).

Results

After full-text review of the papers evaluating IL-2 for TB diagnosis, 16 manuscripts were determined to meet the eligibility criteria for inclusion in the systematic review (Fig. 1). Eleven out of the 16 publications [9, 1726] dealing with IL-2 as a biomarker for the diagnosis of active and latent tuberculosis were considered to be eligible for inclusion in the qualitative analysis. For meta-analysis among these publications, six studies [4, 8, 10, 20, 27, 28] were excluded because IFN-γ was detected only in patients with active tuberculosis [8], it was done in patients during and after their anti-tuberculosis treatment [10], the cytometric detection of IFN-γ-/IL-2-secreting cells was used [4, 28], or they contained no detailed data [27].

Fig. 1
figure 1

Flowchart of study selection

In addition, for the meta-analysis, we excluded six studies [17, 19, 2123, 26] because they did not use a definite cut-off for IL-2 or only reported the median [interquartile range (IQR)] of IL-2 values.

For each count in a 2 × 2 table that contains zero in any cell, where a zero would prevent the calculation of ratio measures, the software automatically adds 0.5 to each cell of the 2 × 2 table for any such study. However, in the study of Biselli et al. [20], zero cells create problems in the computation of ratio measures. In this study, two cells (FP, TN) were zero, so it is impossible to calculate the DOR and the standard error, and this study has to be discarded in the meta-analysis.

We only performed meta-analysis for the evaluation of IL-2 for LTBI diagnosis because the number of studies with sufficient data for the analysis of this marker for active tuberculosis was very low. Table 1 indicates the diagnostic accuracy of IL-2 as a biomarker for the diagnosis of latent tuberculosis. The average sample size in studies dealing with IL-2 as a biomarker for the diagnosis of latent tuberculosis was 54 (range, 6–167).

Table 1 Results of the interleukin-2 (IL-2) assay for the diagnosis of latent tuberculosis infection (LTBI)

The clinical characteristics of these studies, along with QUADAS scores, are outlined in Table 2.

Table 2 Summary of the included studies

The median (IQR) IL-2 values of studies reported in this article are shown in Table 3. In all the studies [9, 1719, 25], the median concentration of IL-2 in pg/ml (IQR) in the LTBI group was higher than the control group, except for one study [26].

Table 3 Median [interquartile range (IQR)] IL-2 values of the studies

Among studies which compared active and LTBI groups with controls, in five studies, the IGRA results among controls were 100 % negative [17, 19, 2123], while in the studies of Wang et al. [9], Ruhwald et al. [25], and Goletti et al. [26], 17, 13, and 42 % of controls were IGRA-positive (Table 4).

Table 4 Result of the interferon-gamma release assays (IGRA) test in the included studies

Diagnostic accuracy

For forest plot analysis, the report by Biselli et al. [20] was excluded automatically from the STATA software due to the existence of two zero cells. The pooled estimates of IL-2 for LTBI diagnosis were as follows: sensitivity, 0.81 (95 % CI, 0.60 to 0.92) and specificity, 0.95 (95 % CI, 0.90 to 0.97). We also noted that the PLR was 15.2 (95 % CI, 8.1 to 28.4) and the NLR was 0.20 (95 % CI, 0.09 to 0.47) (Fig. 2). We found that the SROC curve is positioned near the upper left corner of the curve and the AUC was 0.96 (95 % CI, 0.94 to 0.98) (Fig. 3). Quality scoring was also carried out using the QUADAS tool, in which a score of 1 indicated a fulfilled criterion, 0 an unclear criterion, and −1 if a criterion was not achieved. These scores were used in the meta-regression analysis.

Fig. 2
figure 2

Forest plot showing the sensitivity and specificity of interleukin-2 (IL-2) in the diagnosis of latent tuberculosis infection (LTBI)

Fig. 3
figure 3

Summary receiver operating characteristic (SROC) curve for assessment of the diagnostic accuracy of IL-2 for the diagnosis of LTBI

In our meta-analysis, both QUADAS scores and the IL-2 cut-offs were used in the meta-regression analysis to assess the effect of study quality. Studies with higher quality (QUADAS score ≥11) produced relative DOR values that were not significantly higher than those studies with lower quality (score <10) (p = 0.86), while the differences in the IL-2 cut-off (more than 13.1 pg/ml vs. less than 13.1 pg/ml) affected the diagnostic accuracy (p < 0.001).

The Deeks’ funnel plots for publication bias [16] also showed no asymmetry (Fig. 4). The evaluation of publication bias showed that the bias coefficient was −5.56 and it was not significant (p = 0.74). These results indicate no potential for publication bias.

Fig. 4
figure 4

Deeks’ funnel plots for publication bias

Discussion

This meta-analysis was conducted to assess the validity of IL-2 for the detection of LTBI.

The sensitivity of the IL-2 assay in the included studies ranged from 43 % [24] to 100 % [18], and the specificity of this marker was in the range from 89 % [25] to 100 % [18]. The pooled values of sensitivity (0.81) and specificity (0.95) showed the favorable accuracy of IL-2 for the detection of LTBI.

We did not find differences in the sensitivity, specificity, and DOR (the DOR of the included articles ranged from 2.1 to 6.3, p-value = 0.3). Our data showed that the SROC curve was positioned near the desirable upper left corner. The AUC was 0.96, which indicated a high level of overall accuracy [29]. We presented both the PLR and NLR as our measures of diagnostic accuracy. Likelihood ratios of >10 or <0.1 indicated high accuracy. A PLR value of 15.2 suggests that patients with LTBI have an approximately 15.2-fold higher chance of being IL-2 assay-positive. In addition, the NLR was found to be 0.2; therefore, if the IL-2 result was negative, the probability that this patient has LTBI was approximately 20 %.

It has been suggested that combined analyses of different cytokines which might be expressed by multifunctional T-cells can improve the diagnosis of subjects with LTBI [9, 10, 30]. A possible contribution to discriminate the different TB stages could be due to evaluation of the immunological response to selected M. tuberculosis antigens. In fact, the immune response differs in active or latent stages of infection. It is possible to hypothesize that, in the presence of active tuberculosis, the frequency of effector T-cells (TEM) is higher than central memory T-cells (CEM) in peripheral blood mononuclear cells (PBMC). TEM has a limited lifespan and secrete IFN-γ alone, while CEM is predominant in latent infections and secrete interleukin IL-2 or IFN-γ and IL-2 simultaneously. Therefore, memory cells can be distinguished according to their differential cytokine secretion [4, 31].

Several studies have demonstrated that IL-2 release, after stimulation by TB-specific antigens, was significantly higher in infected patients with TB than healthy controls [8, 9, 1721, 25], and suggested that IL-2 could be a potential biomarker for diagnosing LTBI infection [20]. Nevertheless, Ruhwald et al. reported that IL-2 might not be a useful diagnostic biomarker alone for TB infection due to its low quantity of release [25].

A series of studies has recently revealed that dominant IL-2 functional T-cell signatures were associated with antigen clearance. There were higher proportions of IL-2/IFN-γ CD4+ T-cells in LTBI or successfully treated TB subjects, which have lower bacterial loads, while in active TB, the cytokine profile was shifted towards cells secreting only IFN-γ [9, 10, 27, 28, 32].

In the study by Wang et al., significantly higher levels of TB antigen-stimulated IFN-γ and IL-2 were observed in the active TB and LTBI groups than the control group [9]. In addition, they reported that combining IL-2 with QFT might increase the detection accuracy and reduce the indeterminate results [9].

In the study by Lighter-Fisher et al., the IL-2 levels differentiated children with active TB disease from those with LTBI [21].

In the study by Gourgouillon et al., the intensity levels of IL-2 in stimulated cultures were similar in the LTBI and active TB groups (2,097 and 1,869 pg/ml, respectively; p = 0.852) [23], which might be due to the young age of the patients, as well as the low sample size of this study. However, LTBI in young individuals was associated with relatively higher IL-2 responses.

In the study by Goletti et al. in India, where TB is highly endemic, the definition of “community controls” were recruited from families without cases of TB in the same household or neighborhoods, and TST was performed in only 54 % of individuals. Among those tested by TST, a significant difference between the proportions of positive responses was found in the community controls and household contacts. In addition, 42 % of the community controls were IGRA-positive due to this limitation. The median level of IL-2 to the mitogen in the patients with active TB was higher, although not statistically significantly, than in the household contacts and community controls. Moreover, a significant difference was found between the response in the community controls compared to the household contacts (p < 0.004) [26].

In the study by Suter-Riniker et al. [27], the simultaneous analysis of IFN-γ and IL-2 in plasma supernatants revealed a significant shift toward an IL-2-dominant functional T-cell response compared to those of the intermediate (IFN-γ, 1.0 to 5.0 IU/ml) and high (IFN-γ, >5.0 IU/ml) responders. This type of response has been shown to be associated with the elimination of M. tuberculosis infection and suggests that such patients may not require prophylactic therapy [10, 27, 32].

A greater IL-2 response in LTBI is consistent with the current hypothesis of immunity to intracellular pathogens. Immune responses associated with effective virus control or clearance were predominantly composed of multifunctional/polyfunctional CD4+ T-cells secreting IFN-γ, TNF-α, and IL-2, as well as CD8+ T-cells [32]. An increasing proportion of both IL-2 and IFN-γ is found in individuals who control the virus with no sign of clinical disease [23, 33]. Decreased IL-2 production and deficient IL-2-induced cell proliferation have also frequently been observed during acute TB [30].

In conclusion, according to the meta-analysis, IL-2 is a valid marker for the diagnosis of LTBI. When there is no definite gold standard for the diagnosis of LTBI, the IL-2 release assay in addition to IGRA can improve the ability of IGRA to identify individuals with recently acquired LTBI. In addition, the IFN-γ and IL-2 functional signature is recommended to be used for the monitoring of individuals infected with M. tuberculosis.