Introduction

In the last decade, IGRAs have been established in the medical laboratory as a blood test alternative for the tuberculin skin test (TST) to detect M. tb infection. Two commercial IGRAs based on different methods are available to detect M. tb specific T cells: the T-SPOT.TB, using an ELISPOT technology, and the QuantiFERON®-TB Gold (QFT, since 2015 replaced by the QuantiFERON®-TB Gold Plus) based on an ELISA procedure. In practice, IGRAs are mainly used to replace the TST for detection or exclusion of latent tuberculosis infection (LTBI) [1, 2], but can also complement diagnosis of active tuberculosis (TB) in certain clinical situations [3, 4].

With the increased use of IGRA testing, more results of serially tested subjects became available. Studies with serially tested health care workers showed a remarkable variability of the IGRA results [5]. Most of the studies addressing the reproducibility of the IGRAs used the QFT, data for the T-SPOT.TB are still limited [6]. Sources for variable results in serial IGRA testing can be found in the preanalytical phase, test processing, assessment of raw data and test manufacturing [7]. Also, immunological changes can contribute to IGRA variability as the human immune response against M. tb is dynamic [8, 9].

The aim of our study was to determine the reproducibility of the IGRA T-SPOT.TB during a 14 year period and to analyse the mismatching results of serial testing.

Materials and methods

T-SPOT.TB test

The T-SPOT.TB test was performed according to the manufacturer’s instructions (T-SPOT.TB, Oxford Immunotec Limited, Abingdon, UK). Briefly, peripheral blood mononuclear cells (PBMC) were isolated by standard ficoll-hypaque density-gradient centrifugation from heparinized blood samples. The PBMC were washed twice with RPMI-1640 medium (Biochrom, Berlin, Germany), counted with an automated haematology analyser (KX21N, Sysmex, Norderstedt, Germany) and adjusted in AIM V medium (Invitrogen, Karlsruhe, Germany) such that each of four wells of the assay plate contained 250,000 PBMC. The cells were incubated at 37 °C and 5% CO2 for 16–20 h with medium (negative control), phytohemagglutinin (positive control) and peptides from early secretory antigen-6 (ESAT-6) and culture filtrate protein-10 (CFP-10) of M. tb. Spots were counted with an automated enzyme-linked immunospot (ELISPOT) assay plate reader (AID ELISPOT Reader System ELR02 with the AID Software, AID Diagnostika GmbH, Straßberg, Germany; since 2016 Bioreader 6000 Pro-E alpha with the Bioreader Software, BIO-SYS GmbH, Karben, Germany). Each subject had to display a satisfactory response to the positive control (> 20 spots and/or saturation of the well) and a low spot number in the negative control (≤ 10 spots of the well); otherwise, the test was classified as indeterminate. T-SPOT.TB test responses were considered negative if both antigen wells contained not more than four spot forming cells compared with the negative control. The result was positive if there were at least eight spots more than in the negative control well in one or both of the antigen stimulation wells. Spot count differences of 5, 6 or 7 spots were defined borderline as recommended by the US Food and Drug Administration (FDA) [10]. Quantitative ESAT-6 and CFP-10 specific immune responses were expressed as spot forming cells/106 PBMC (SFC/M). A T-SPOT.TB test ‘conversion’ was defined as a negative test result followed by a positive result upon repeated testing and a ‘reversion’ was defined as a positive result with a subsequent negative result.

Study population

The analysis included all subjects tested with the T-SPOT.TB test in our laboratory between October 2004 and December 2018. Subject data and test results were retrieved from our laboratory information system (Medat, Munich, Germany). Before further analysis, a randomly selected code number was assigned to every subject to enable the search for serial tests after data anonymization. Subsequently, the data were stripped of any information allowing an identification of the individuals. Thus, the data set contained the following elements: anonymous code number, sex, age in years, sample collection date, initiator of testing, T-SPOT.TB test result (negative, positive, borderline, indeterminate) and individual spot counts for the antigen panels ESAT-6 and CFP-10. An overview of the data selection is shown in Fig. 1. From 69,514 available results, 2140 (3.1%) tests with borderline results and 2217 (3.2%) tests with indeterminate results were excluded from further analysis. Of the remaining 65,157 samples, further 52,915 samples from singularly tested subjects were removed. In subjects with more than one subsequent T-SPOT.TB test performed, always the first two test results were chosen. Further 1362 results of third or more subject testing were excluded. Finally, a total of 10,880 samples from 5440 subjects with serial T-SPOT.TB tests at different times were available for this study. The baseline characteristics of the study population are listed in Table 1.

Fig. 1
figure 1

Flow chart of the serially tested subjects enrolled in the study

Table 1 Baseline characteristics of 5440 serially tested subjects

Statistics

The Chi-squared test was used to compare the frequencies of conversions and reversions at different times. P values of < 0.05 were considered significant. The concordance of agreement between the different assays was assessed with Cohen’s kappa test (kappa of > 0.75, excellent agreement) [11]. Statistical analyses were performed using MedCalc for Windows, version 11.4.1.0 (the MedCalc Software, Ostend, Belgium).

Results

Analysis of serial T-SPOT.TB tests

The T-SPOT.TB test results of 5440 subjects with two consecutive tests are shown in Table 2. The frequency of positive T-SPOT.TB test results was 15.1% for the first and 14.4% for the second test. The median time interval between the tests was 258 days (interquartile range [IQR] 62–665 days). Concordant negative or positive test results were observed in 5202 subjects, resulting in an agreement of 95.6% and a Cohen’s kappa value of 0.83 (95% confidence interval [95% CI] = 0.80 to 0.85), demonstrating a high degree of concordance. Discordant results were observed in 238 (4.4%) subjects, of whom 101 (1.9%) had a test conversion from initially negative to positive result and 137 (2.5%) subjects had a test reversion.

Table 2 T-SPOT.TB test agreement of 5440 serially tested subjects

The detection of T cells specific for the antigens ESAT-6 and/or CFP-10 defines a positive result in the T-SPOT.TB test. Both antigens are measured separately by the assay and give an antigen reactivity pattern for each test. Four hundred ninety-five of 682 (72.6%) subjects with repeatedly positive results had identical antigen reactivity in the first and second test as shown in Table 3, giving a moderate Cohen’s kappa value of 0.56 (95% CI = 0.50 to 0.61).

Table 3 Agreement of T-SPOT.TB antigen reactivity in 682 repeatedly positive tested subjects

In tests with positive results, an immune response against CFP-10 was observed more often compared with ESAT-6 in the first and second test (556 vs 468; 544 vs 475 tests, respectively). ESAT-6 and CFP-10 specific T cells were detected together in 342 (50.1%) subjects in the first test and in 337 (49.4%) subjects in the second test. Additional reactivity of the second antigen occurred in 87 (12.8%) of the serially tested subjects, where only one antigen reacted initially. Loss of one antigen reactivity in initially ESAT-6/CFP-10 double reacting tests occurred in 92 (13.5%) of the subjects. Changes of antigen reactivity pattern from single antigen reaction with ESAT-6 specific T cells to CFP-10 specific T cells or vice versa in the second test were rare (8 subjects, 1.2%), indicating a low variability of the antigen pattern in serial testing.

Impact of spot number on frequency of test reversions

The antigen response of initially positive tests, calculated as sum of ESAT-6 and CFP-10 specific T cells, was divided in terciles (Table 4). In the lowest tercile with an antigen response between 32 and 88 SFC/M, 94 of 273 subjects (34.4%) had a negative T-SPOT.TB result in the second test. In the spot range of 92–288 SFC/M, 34 of 273 (12.5%) of the tested subjects reverted. In the highest tercile with initially spot numbers above 288 SFC/M, the frequency of reversion was only 3.3% (9 of 273). The chance for a later test reversion is high for results with low spot numbers (odds ratio = 6.14), whereas at high spot numbers above 288 SFC/M reversions are rarely observed (odds ratio = 0.11).

Table 4 Spot numbers of 819 initially positive tested subjects divided in terciles and rate of T-SPOT.TB test reversions

Time interval between first and second test and frequencies of T-SPOT.TB test conversions and reversions

We further categorised all test conversions and reversions to the elapsed time between the first and second test of the subjects (Figs. 2 and 3). The number of test conversions in subjects retested after 0.5–1 year, 1–2 years and more than 2 years was not significantly different with 9/733 (1.2%), 13/890 (1.5%) and 13/1068 (1.2%), respectively. Subjects retested within 6 months had significantly more conversions (66/1930, 3.4%) compared with subjects retested at later time (35/2691, 1.3%).

Fig. 2
figure 2

Time between serial tests of 4621 initially negative tested subjects and rate of T-SPOT.TB test conversions. The number of converting tests/total number of tests is displayed above the bars. *: p < 0.05; n.s.: not significant

Fig. 3
figure 3

Time between serial tests of 819 initially positive tested subjects and rate of T-SPOT.TB test reversions. The number of reverting tests/total number of tests is displayed above the bars. *: p < 0.05; n.s.: not significant

Test reversions occurred in 47 of 418 (11.2%) and 13 of 111 (11.7%) subjects when retested up to 6 months and after 0.5–1 year, respectively. The number of test reversions significantly increases with the second test performed after 1–2 years (25/116; 21.6%) and after more than 2 years (52/174; 29.9%).

T-SPOT.TB test reversions and age of the subjects

The increase of T-SPOT.TB test reversions depends on the time of second testing but is not observed in younger subjects. Serial tests within or after 1 year had no significant different rate of test reversions in subjects up to the age of 40 years as shown in Fig. 4. In elderly subjects aged between 40 and 60 years, the test reversion rate of 24.6% (28/114) was significantly higher when serial tests were performed after 1 year compared with tests within 1 year (16/159; 10%). The rate of test reversion was even higher in subjects above 60 years where 31.9% (43/135) tests reverted after more than 1 year compared with a rate of 12.1% (29/239) for tests within 1 year. Elderly men had more test reversions than women but this difference did not reach statistical significance (data not shown).

Fig. 4
figure 4

Age of 819 initially positive tested subjects and rate of T-SPOT.TB test reversions serially tested within or after 1 year. Numbers above bars indicate the number of reverting tests/total number of serial tests. *: p < 0.05; n.s.: not significant

Discussion

The retrospective study contains data to our knowledge from the longest time period of T-SPOT.TB testing in a medical laboratory. In our 14-year observation period in a TB low burden country, 5202 (95.6%) of 5440 serially tested subjects had concordant results in the T-SPOT.TB test. These data demonstrate that the T-SPOT.TB is a reproducible test system in the medical laboratory. An even higher reproducibility of the T-SPOT.TB was found in a large US study of health care workers, published by King et al. [12]. In this study, 19,405 of 19,630 tested health care workers had the same result in the second test, giving a test concordance of 98.9%. The mean conversion rate in the US study was 0.8%; tests performed within 150 days were excluded to minimize the potential for including results from contact investigations. In our study, contact tracing seems to influence the conversion rates as subjects retested within 6 months had a significant higher rate of 3.4% test conversions compared with subjects retested at later time as shown in Fig. 2. For subjects tested after a time interval of more than 6 months, the rate of test conversions was 1.3%, which is comparable with the 0.8% rate observed in the study of King et al. Also, a low rate of T-SPOT.TB conversions was found in a study from two hospitals in Japan. Among 373 serial test within 2 years, the conversion rate was 1.1% [13].

In a further US study of Dorman et al. using the T-SPOT.TB for TB screening in health care workers, a high conversion rate of 8.3% (177 of 2137) was observed [14]. As the authors state, most IGRA conversions observed in their study appeared to be false positive. Assuming that the test conversions of subjects in our study retested after more than 6 months are also mainly caused by false positive results, our rate of 1.3% (35 of 2691) conversions is considerably lower compared with the Dorman et al. study. Beside various study conception and design, there were also differences in the technical processing of the T-SPOT.TB assay. The reading of the wells was performed with automated ELISPOT reader systems in our study, which have a lower variability compared with visualization of the spots with magnifying glass or microscope [15]. Also, our single-centre setting might be important, since differences in laboratory practices can affect the outcome of ELISPOT assays [16]. Furthermore, Dorman et al. assessed borderline results as negative, whereas our study excluded borderline results from the data analysis. Results near the test cut-off are in a zone of uncertainty and difficult to reproduce due to inter- and intra-assay variability of the test [17]. Therefore, a borderline zone is meaningful and was integrated in the test instruction by the manufacturer of the T-SPOT.TB to increase test accuracy as recommended by the FDA [18, 19]. In practice, the use of a borderline zone increased the test resolution and identified test results which should be repeated [20].

The overall test reversion rate in our study was 16.7% (137 of 819), a similar reversion rate of 17.6% (78 of 443) was observed in the US study of King et al. A high reversion rate of 63.9% (92 of 144) for the T-SPOT.TB was found in the US study of Dorman et al., which is substantially higher compared with the rates of our study and the data of King et al. It seems difficult to explain this divergent reversion rate of Dorman et al. compared with our study, but possible reasons may be the same as mentioned above in the discussion of also divergent rates of test conversion.

The rates of test reversion in our study depend on the time of second testing. The lowest rate of 11.3% (60 of 529) test reversions occurs when subjects were retested within 1 year. A similar rate of 12.5% (3 of 24) test reversions for subjects retested within 2 years was observed in the study of Teranishi et al. [13]. A reversion rate of 9.4% (3 of 32) was found in household contacts of tuberculosis patients during the 3-year follow-up period [21]. One reason for test reversions may be the clearing of M. tb infection [22], another option changes in the life cycle of M. tb entering a dormant state without secreting the antigens ESAT-6 and CFP-10 [23]. Also, reversions of the test have been observed after antituberculous treatment of TB and LTBI [24].

Generally, in assays with suboptimal sensitivity and specificity, test conversions and reversions have to be expected [25]. When no true infections occur in a population, test conversions can be caused by a false negative first test (suboptimal sensitivity) or a false positive second test (suboptimal specificity). Similar considerations apply to test reversions when there is no clearing of infection in the subjects. In a model analysis, the expected false conversion and reversion rates of the QFT were calculated based on the sensitivity and specificity of the test and the prevalence of M. tb infection [26].

Assuming that the observed rates of conversion and reversion in our study rely on test inaccuracy, values for test sensitivity and specificity can be calculated with this model. However, as independency of test results is a prerequisite, the model may not fully depict the real diagnostic situation, even though Germany is a country with low TB incidence, and new M. tb infections are unlikely to occur in our subjects [27].

Our model calculation uses the 1.3% rate of test conversions after 6 months to exclude ‘real’ infections occuring from contact tracing and the 11.3% rate of test reversions within 1 year to avoid patient age-related effects. For both rates of conversion and reversion, the relationship of test sensitivity and specificity is shown in Fig. 5 (for calculation details see Appendix). Both curves indicate a high specificity of the test (> 97%) in our setting. This high test specificity is in accordance with values from populations with low risk for M. tb infection, where specificities of 98.9% and 100% were observed for the T-SPOT.TB [28, 29]. Both graphs also suggest a high test sensitivity in our study. E.g. when the specificity does not exceed 99%, the sensitivity of the test is at least 94%. This fits to former data from our laboratory with a directly measured T-SPOT.TB sensitivity of 97.2% (95% confidence interval = 90.3 to 99.7) in patients with TB [30]. Thus, as our rates of discrepant test results match the values calculated from the model, we suggest that the test conversions after 6 months (1.3%) and the test reversions within 1 year (11.3%) of serial testing mainly reflect the suboptimal sensitivity and specificity of the test system.

Fig. 5
figure 5

Analysis of sensitivity and specificity based on the observed rates of test conversion and reversion. Values calculated from model as described [26], with an assumed 15.1% prevalence of TB infection in our study population

Interestingly, this model calculation based on the reproducibility of the test needs no patient data and has no requirement to diagnose LTBI and TB. This is an alternative approach to estimate T-SPOT.TB specificity and sensitivity, as both test properties are always difficult to determine in the absence of a diagnostic gold standard for LTBI.

The reversion rate in our study significantly increases from 11.3 to 21.6% (25 of 116) and up to 29.9% (52 of 174) when subjects were retested after 1–2 years and more than 2 years, respectively.

The reasons for repeating the positive T-SPOT.TB tests are unknown as second tests are normally not necessary and also not recommended by our laboratory.

An increasing rate of T-SPOT.TB test reversions in longitudinal testing was also observed in household contacts of tuberculosis patients. After 6 years, 13 of 34 contacts (38.2%) had a test reversion, considerably higher compared with the 9.4% (3/32) reversion rate in the 3-year follow-up [21]. The increasing rates of test reversions after 1 year of serial testing in our study suggest an additional test independent factor contributing to the test reversions. Our data further demonstrate that the increased rate of test reversions when retested after more than 1 year is age-dependent and only seen in subjects above the age of 40 years. This points to age-related changes of the cellular immune system which may cause a reduced number or function of M. tb specific effector T cells in the peripheral blood. Generally, lymphocyte development and function change in elderly persons and the immune responsiveness are reduced [31]. Thus, as a consequence, a limited M. tb specific cellular immune response may lower IGRA sensitivity in elderly subjects. This consideration is supported by a study from the Republic of Korea, comparing the IGRAs QFT and T-SPOT.TB in TB patients. For both IGRAs, the trend of age-related changes in sensitivity was significant [32]. Also, data from the TST, the predecessor of IGRAs, support the concept of an age-dependent M. tb specific immune response as a declining test reactivity in elderly subjects is known for a long time [33, 34].

Potential limitations of our study are the missing subject level clinical and epidemiological data due to the retrospective design and the ethics committee requirements.

Without clinical data, we cannot rule out that other factors besides initial spot numbers, time of second test and patient age may contribute to the observed test reversions in our study.

As we receive samples from all clinical areas where blood testing for M. tb infection is required, our study population contains subjects with severe immunosuppressive conditions (e.g. AIDS, haematological malignancies) where IGRAs are known to be relatively insensitive and may give false negative results [35]. As a consequence, in studies excluding subjects with severe immunosuppression, the serial test concordance may even be higher. The strength of our study is the high number of subjects serially tested with the commercial T-SPOT.TB test and the long period of observation in a single centre.

In conclusion, during the observation period of 14 years in our laboratory, the T-SPOT.TB gave reproducible results in 95.6% of the 5440 serially tested subjects. Test conversions occur in a low frequency of 1.3% when the test is repeated after more than 6 months. Serial tests within 1 year have a reversion rate of 11.3%, both rates of conversion and reversion may be a consequence of imperfect test sensitivity and specificity. Higher reversion rates in subjects above the age of 40 years suggest that additional age-related changes in the immune system contribute to the reproducibility of the test. The T-SPOT.TB demonstrates a high reproducibility in our medical laboratory.