Introduction

Infection in relation to central vascular grafts is a dreaded complication. Although it is uncommon with an incidence of 0.6–5%, vascular graft infection (VGI) is a severe complication and the mortality ranges between 25 and 88% [1, 2]. Thus, early and correct diagnosis of VGI is important, but clinically challenging due to a lack of specific symptoms. Imaging plays a central role, but controversy remains on the diagnostic strategy and the choice is not without consequence as false negative diagnoses may have fatal consequences, while false positives may put the patient through potentially unnecessary and high-risk surgery or long-term antibiotic treatments [1,2,3].

Several radiology-based imaging techniques are used or being explored in VGI, e.g., ultrasonography (US), magnetic resonance imaging (MRI), and computed tomography (CT). The modality of choice is CT, but all three may be hampered by a relatively high number of false negative findings in low-grade infections [4]. Regarding CT, Reinders et al. reported a pooled sensitivity and specificity [with 95% confidence interval (CI)] of 67% (57–75%) and 63% (48–76%), respectively [2]. Similar numbers are not available for MRI and US as they have not been as thoroughly investigated because of the availability of CT [5].

In recent years, molecular imaging with fluorine-18-labeled fluorodeoxyglucose-positron emission tomography/computed tomography (FDG-PET/CT) is increasingly explored in infectious disease, as the glucose-analog FDG also accumulates at sites with a high glucose turnover due to infection and inflammation [6, 7]. Contrary to earlier stand-alone PET, hybrid PET/CT enables a more precise differentiation between soft tissue and graft infection and makes FDG-PET/CT a potentially more useful diagnostic tool [1, 4, 6,7,8]. However, the physiologic reactive FDG uptake along the graft material may hamper the diagnostic accuracy in VGI with FDG-PET/CT [9]. Several studies are available on the use of FDG-PET/CT in VGI, and new data are added regularly, but controversy still remains on the diagnostic strategy, including which interpretation schemes to use, albeit four different parameters are employed, i.e., focal uptake, visual grading scale (VGS), maximum standardized uptake value (SUVmax), and tissue-to-background ratio (TBR).

The present study provides a systematic review and meta-analysis of the available literature to summarize current status on the abovementioned interpretation parameters and possibly give directions for future research.

Materials and methods

This systematic review and meta-analysis were conducted according to preferred reporting items of systematic and meta-analysis protocols, PRISMA (Supplemental Fig. 1) [10].

Data

The search strategy was developed in cooperation with a specialist research librarian, and we conducted a systematic search in Medline, Embase, and Cochrane Library on 8 October 2018. Medical Subject Headings (MeSH) terms and free text search were included (Supplemental Table 1). Duplicates were removed using endnote, and the remaining studies were screened by title and/or abstract independently by two authors (SKS, TB). In discordant cases, a third reader (SH) was consulted and consensus was reached. Later full-text assessments were carried out similarly.

As thoracic and abdominal grafts (central grafts) are the grafts with the most severe complications, only original studies with central grafts were to be included, but during the screening process, it became apparent that few studies only included central grafts. The inclusion criteria was, therefore, altered to include studies on central and possibly also peripheral grafts, but excluding studies on peripheral grafts only. The studies had to include FDG-PET/CT and VGI had to be verified by a reference standard including microbiology and/or histopathology. Only articles written in English, Swedish, Danish, and Norwegian were included. We excluded all studies on stand-alone PET or CT only. All editorials, commentaries, letters-to-the-editor, case-reports, case-series (n < 6), and narrative reviews were also excluded.

By using the quality assessment of diagnostic accuracy studies 2 (QUADAS-2) and standard for reporting of diagnostic accuracy studies (STARD) tool, we assessed possible sources of bias. The QUADAS-2 tool is divided into four domains; patient selection, index test, reference standard, and flow and timing.

In domain 1 (patient selection), we explored how patients were enrolled, if the included patients matched the review question, and whether any inappropriate exclusion criteria were used. The focus in domain 2 (index test) was whether FDG-PET/CT interpreters were blinded. Domain 3 revolved around the reference standard and its applicability to correctly diagnose VGI. Domain 4 (flow and timing) assessed if all the patients received the same index test and reference standard, including the time interval between the two.

We included all interpretation criteria encountered, i.e., focal uptake, five-point visual grading scale (VGS), maximum standardized uptake value (SUVmax), and tissue-to-background ratio (TBR).

SUVmax is defined as the maximum concentration of FDG divided by injected dose and corrected for body weight. All studies that explored SUVmax used the area under the curve (AUC) method to define the threshold and used a region of interest (ROI) in the periprosthetic tissue to calculate SUVmax for this area.

TBR was calculated by dividing SUVmax in the ROI by the SUVmax of the blood pool. In other studies, TBR has been addressed as target to background ratio, but in this case, the periprosthetic tissue was the target of interest and, therefore, tissue is equivalent to target.

Statistics

Five parameters were derived and evaluated; sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy and analyses were performed for each outcome: focal uptake, VGS, and SUVmax.

Meta-analyses were performed by applying random effects models using the method of DerSimonian and Laird, with the estimate of heterogeneity being taken from the from the Mantel–Haenszel model [11]. Studies were weighted according to the underlying number of observations of a parameter’s calculation (e.g., sum of true positive and false negative findings in case of sensitivity). Forest plots were derived to (1) graphically display the point estimate and a respective 95% CI for every parameter on a per-study basis, (2) graphically display the summary estimate for every parameter (including its 95% CI) across studies for each outcome, and (3) derive a measure for the heterogeneity of the studies within each outcome (i.e., an I-squared value [12]). Moreover, funnel plots were investigated to visually assess publication bias for each parameter within each outcome (Supplemental Fig. 2A–E) [13,14,15,16], and statistical tests on funnel plot asymmetry due to small-study effects were conducted (Supplemental Table 2) [16]. A summary receiver operating characteristic plot based on a generalized linear mixed model approach to bivariate meta-analysis of sensitivity and specificity was supplemented for focal uptake only as respective plots were otherwise unavailable due to insufficient number of included studies (visual uptake; n = 3) and model instability (SUVmax) (Supplemental Fig. 3) [17, 18]. Level of significance was 5%. All analyses were done using STATA/MP 15.0 (StataCorp, College Station, TX).

Results

Study selection

The master search resulted in 307 articles, from which 55 duplicates were removed (Fig. 1). After screening the remaining 252 articles, 234 articles were excluded (e.g., non-original papers and case reports, cf. Fig. 1), and 18 studies were assessed by full text. Two articles turned out to be reviews, five were abstracts, one paper employed stand-alone PET only, and these eight studies were excluded. Thus, ten eligible studies met the inclusion criteria. However, one of these was not sufficiently transparent: From the data provided in the article, it was not possible to determine the number of true positive, true negative, false positive, and false negative findings and it was not possible to follow their results. Therefore, this article was excluded due to the failure of data extraction [19].

Fig. 1
figure 1

Adapted from: Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med 6(6): e1000097. https://doi.org/10.1371/journal.pmed1000097

PRISMA flow chart for literature selection

Study characteristics

The study characteristics are shown in Table 1: Of the nine included studies, five were prospective [3, 20,21,22,23] and four were retrospective [4, 24,25,26]. Study samples varied from 9 to 76 included patients with a broad age range, and median 86% male participants (range 66–92%). Follow-up varied from 6 to 58 months. Only studies with central or central and peripheral vascular grafts were included. Few studies have linked graft localization to outcome, so further analysis was not performed. Six studies employed a low-dose, unenhanced CT, whereas three studies used full diagnostic CT with contrast enhancement. A total of 325 patients underwent FDG-PET/CT. In most studies, clinical follow-up and further imaging were used when the suspicion of graft infection was considered low [3, 4, 20,21,22,23,24,25,26,27]. The FDG-PET/CT scans were assessed using qualitative or semi-quantitative methods, i.e., FDG uptake patterns, SUVmax, or TBR. Only SUVmax did not exhibit statistical significant signs of funnel plot asymmetry due to small-study effects across several parameters.

Table 1 Study characteristics for all included studies

QUADAS: quality assessment

There was an overall low concern regarding applicability (Table 2). With regard to patient selection, three out of nine (33%) studies were assessed to have a high risk of bias. Regarding index test, four (44%) studies had a high or unclear risk of bias, whereas the reference standard displayed a high risk of bias in 2/9 (22%). In domain 4 (flow and timing), all studies had a high or unclear risk of bias.

Table 2 Assessment of the included studies according to the QUADAS-2 tool

Qualitative assessment of uptake pattern: focal/diffuse or VGS

Five studies interpreted images to be qualitative based on uptake patterns, i.e., focal or diffuse uptake patterns of FDG, with a focal pattern implying a higher risk of VGI than a diffuse pattern [20,21,22,23, 26]. The pooled sensitivity and specificity (95% CI) were 90.6% (81.7–99.4%) and 82.8% (71.3–94.3%), respectively. Pooled PPV, NPV, and accuracy were 83.9%, 90.8%, and 87.2%, respectively (Fig. 2).

Fig. 2
figure 2

ae Forest plots showing pooled and individual values of sensitivity (a), specificity (b), positive predictive value (c), negative predictive value (d), and accuracy (e) for focal uptake, visual grading score (VGS), and maximum standardized uptake value (SUVmax)

Two of these also used five-point VGS (0–4), albeit different ones [22, 26]. The pooled sensitivity and specificity (95% CI) were 86.8% (59.3–100.0%) and 69.4% (39.9%–98.9), respectively. Pooled PPV, NPV, and accuracy were 81.7%, 81.1%, and 85.5%, respectively (Fig. 2).

SUVmax and TBR

Six studies interpreted images according to SUVmax [3, 22] [4, 24,25,26]; the pooled sensitivity and specificity for SUVmax (95% CI) were 92.8% (83.2–100%) and 69.7% (52.4–86.9%), respectively; estimated mean values for PPV, NPV, and accuracy were 84.1%, 80.3%, and 85.5%, respectively (Fig. 2).

Only one study assessed TBR; Berger et al. did not specify the location of the blood pool [26]. The pooled sensitivity and specificity (95% CI) were 71.8% (54.6–84.4%) and 70.4% (51.5–84.2%), respectively. There were a pooled PPV, NPV, and accuracy of 74.2%, 67.9%, and 71.2%, respectively.

Discussion

Based on the included available literature, FDG-PET/CT performs well in central VGI especially when using focal versus diffuse FDG uptake as a diagnostic criterion, i.e., sensitivity and specificity of 90.6 and 82.8%, respectively. However, the literature is relatively limited; only nine studies were considered eligible. Patient populations and methodologies were heterogeneous, including the interpretation schemes employed, and the included papers were not without risk of bias. Nonetheless, our results were comparable to similar studies.

Overview of the problem and other diagnostic methods

A precise diagnosis of VGI is challenging, but extremely important. Due to the heterogeneous clinical presentation and the difficulties in obtaining verification through biopsies, a non-invasive diagnostic tool is highly desirable. CT has been the modality of choice because it can visualize characteristic features of VGI; persistent or expanding perigraft soft tissue, fluid, aortoenteric fistulas, and/or ectopic gas. Thus, CT is a good modality regarding localizing certain morphologic features of infection anatomically, but as mentioned, CT has relatively low sensitivity and specificity compared to other modalities. Furthermore, the use of contrast media may be contraindicated in some patients [2, 28].

Advantages of ultrasound include that it is a simple bedside modality without ionizing radiation. However, several issues may hamper its use, e.g., overlying bowels and adipose tissue can affect the interpretation and limit its predictive value, and additional investigations are almost always indicated. Older studies suggest stand-alone FDG-PET, which is now considered an obsolete investigation inferior to CT when it comes to exact anatomical localization. The abovementioned imaging techniques all have variable and mediocre sensitivities and specificities [28].

Different interpretation methods

None of the included studies used the exact same combination of parameters to assess FDG-PET/CT which may be the cause of the high heterogeneity. The sensitivity varied among the parameters used.

Qualitative assessment using focal uptake pattern versus diffuse uptake showed the best overall result. Although the sensitivity of focal versus diffuse uptake (90.6%) was comparable to VGS (86.8%), and SUVmax (92.8%), specificity was superior for uptake pattern (82.8%) compared to VGS (69.4%) and SUVmax (69.7%). Sensitivity ofTBR was inferior to all (71.8%) and specificity (70.4%) was comparable to VGS and SUVmax. These findings are in concordance with previous findings presented by Rojoa et al. who also found focal uptake to be superior to the semi-quantitative measures with sensitivities ranging from of 69–100% [1].

When using focal uptake and VGS, one must keep in mind that these are subjective parameters. In VGS, the FDG uptake is compared to that of surroundings, inactive muscle, fat, and bladder, while focal uptake distinguishes between focal and diffuse uptake patterns. Although focal uptake and VGS have been found to be the parameters with the highest overall statistical value, they are sensitive to interpreter experience, as they are full-on qualitative measures. This becomes apparent in the results from Spacek et al. They assessed the scans with regard to focal or diffuse uptake and performed two analyses; one including inhomogeneous uptake and one excluding this group. The results vary with sensitivity and specificity of 78% and 93%, respectively, when the inhomogeneous group was excluded, and vice versa (98% and 77%, respectively) when it was included [21]. Thus, scans that do not dichotomously fit into either focal or diffuse patterns represent a diagnostic dilemma, and as these categories are always subjective, individual variation between interpreters will invariably be present. One way of counteracting this could be through guidelines on interpretation, i.e., what is to be interpreted as focal and diffuse, and how should equivocal findings be handled.

With regard to VGS, it is worth mentioning that although the two groups that used this approach in our study used a five-point scale (0–4), the underlying definitions of the grades were quite different: The VGS used by Berger et al. was the following: grade 0 (FDG uptake similar to that in the background), grade 1 (low FDG uptake, comparable to that by inactive muscles and fat), grade 2 (moderate FDG uptake, clearly visible and distinctly higher than the uptake by inactive muscles and fat), grade 3 (strong FDG uptake, but distinctly less than the physiologic uptake by the bladder), grade 4 (very strong FDG uptake, comparable to the physiologic urinary uptake by the bladder). Grades 3–4 were considered positive for infection. The one used by Sah et al. was: grade 1 (normal background activity), grade 2 (mildly increased, but diffuse FDG uptake along the graft; mild uptake = less than twice the blood pool activity in the ascending aorta; strong uptake = more than twice the blood pool activity in the ascending aorta), grade 3 (focal, but only mild FDG uptake or strong diffuse FDG uptake along the graft), grade 4 (focal and intense FDG uptake ± diffuse FDG uptake along the graft), grade 5 (focal and intense FDG uptake plus fluid collections/abscess formation). Grades 3–5 were considered positive for infection.

As aforementioned, SUVmax reflects the amount of FDG uptake in a specific region after FDG injection after adjustment for bodyweight, but consensus is needed on cutoff values, if semi-quantitative FDG-PET/CT is to be considered as a new gold standard for diagnosing VGI [24, 26]. Variable cutoff values may reflect that FDG uptake and SUV measurements are affected by numerous technical and physiological factors, e.g., obesity as adipose tissue has a lower metabolic rate, and, therefore, a lower concentration of FDG on FDG-PET/CT. Houshmand et al. suggest that in obese patients, SUV may seem to be increased in non-adipose tissue compared to adipose tissue, and that adjusting for body surface area or lean body mass could be more precise [29, 30]. The EANM has tried to eliminate some of the technical challenges with FDG-PET/CT such as administering and dosage of FDG and image acquisition. By standardizing the scan procedure and developing guidelines aimed at improving image quality and accuracy of FDG-PET/CT, the procedure should become more reliable and reproducible [30,31,32,33]. No such measures were taken in the included papers, so even though our study found that SUVmax had a relatively high sensitivity and specificity, the lack of established cutoff values renders the accuracy of the interpretation less reliable. According to Tokuda et al. SUVmax > 8 should be set as the cutoff value, as it provides the highest sensitivity (100%) and specificity (80%). Berger et al. found the best cutoff value to be > 5.5 (sensitivity 80% and specificity 81%). These are wide ranges, which may indicate that it is hard to reproduce. This may also be due to the size of the studies. Tokuda et al. included 9 patients, whereas Berger et al. included 59. In a recently published study by Husmann et al., a cutoff value of 4.9 was suggested as it had the highest sensitivity, specificity, PPV, NPV, and accuracy (100%, 50%, 100%, 72.2%, and 78.3%, respectively) [34].

TBR was only used in one study, and Berger et al. did not specify the location of the blood pool. TBR has the lowest sensitivity (71.9%), PPV (74.2%), NPV (67.9%), accuracy (71.2%), and specificity comparably low (70.4%) as the other lowest ranking parameters. Thus, TBR is substandard and less reliable. The best cutoff was 3, but further studies aiming at methodological consensus on cutoff and blood pool are needed if TBR is to become a useful parameter in this setting [26].

Study strengths and limitations

This study’s strength is that two researchers independently assessed all articles, and that PICO, PRISMA, and QUADAS-2 were applied. The study might have been strengthened if the search had been wider to screen more articles and reducing the risk of missing relevant publications. The study inclusion criteria had to be altered due to lack of sufficient number of studies with central grafts only. The main focus of this study was to shed light on FDG-PET/CT as a diagnostic tool in detecting VGI, and originally papers including peripheral grafts were excluded, as such infections primarily present a high risk of limb amputation but rarely fatal outcome. Central graft infections have a higher mortality and morbidity, and eliminating these infections may be impossible without high-risk surgery, and it is crucial to enhance the diagnostic methods available [35].

Patient-related challenges

The physiological response of the immune system to synthetic materials will usually be seen on FDG-PET/CT as a mild-to-moderate diffuse FDG uptake along the prosthesis, and such uptake may be misread as low-grade infections and vice versa [9]. Some propose to postpone the FDG-PET/CT scan until > 3–4 months post-operatively to avoid false positives, and theoretically this may be a good proposal, but in reality, scans cannot be postponed if patients present symptoms of VGI shortly after their operation, despite a risk of false positive. Tokuda et al. found true positives between 2.2 and 236.1 months post-operatively and true negatives between 1.6 and 31.3 months, i.e., a short time interval between surgery and FDG-PET/CT does not necessarily correlate to a high number of false positives [24]. Thus, interpreters should be aware of the risk of false positive in relevant settings, but not withstand from scanning these patients if clinically relevant.

Along these lines is also the prosthesis material itself and the surgical procedures used. Some of the studies report the proportion of open and endovascular procedures, but did not link this to outcome [2]. Keidar et al. found significantly higher FDG uptake in Dacron grafts than GoreTex grafts or native veins, which must also be taken into account, when interpreting scans [36, 37].

Another factor that may influence the results is the blood glucose level, but diabetic status blood glucose levels were not consistently reported. However, the relationship between DM, blood glucose levels, and the sensitivity of FDG studies is still incompletely understood, especially in infection: Rabkin et al. reported that hyperglycemia may decrease the intensity of FDG uptake in malignant cells, but it does not significantly affect the detectability rate of infectious or inflammatory processes [38]. Finally, patients often received antibiotic treatment prior to FDG-PET/CT and microbiological verification on mere suspicion of VGI. This presents a risk of bias from false negative results. In most studies, the number of patients treated with antibiotics was reported, but it was not specified which patients and it was not linked to outcome.

Challenges related to reference standard

In the QUADAS-2 assessment, domain 4 (flow and timing) was considered to present high or unclear risk of bias. This was either because time interval between the index test and the reference standard was not reported, or that not all study objects received the same reference standard. In most of the studies, there was no predefined accepted time interval between FDG-PET/CT and reference standard, and this may introduce bias due to developments during the course of patients suspected of VGI. Additionally, not all patients received the same reference standards. It varied between verification by biopsy/puncture (i.e., tissue cultures for bacterial growth), blood cultures, and clinical follow-up. There is a risk that one may overlook low-grade infections, and this might not be the issue if all patients had verification by biopsy and microbiological culturing. However, these are challenges that reflect the reality, and although it may enhance the credibility of the test if there was a pre-specified time interval and all patients received the same reference standard, preferably microbiological testing, this is not always possible in these critically ill patients where treatment must be instituted immediately on suspicion of VGI; although one always opts to get microbiological specimen prior to antibiotic treatment, it is an invasive procedure with risks. Consequently, only few studies submit all patients to such reference standards. Patients with low suspicion of VGI were often followed up clinically, usually by blood samples, blood cultures, and further imaging, often CT. This may present a risk of overlooking low-grade infections, as they can be misinterpreted as chronic inflammation.

Comparison with previously published reviews

This study differs in some respects from other recently published systematic reviews and meta-analysis on the same subject. Rojoa et al. also reported focal uptake and SUVmax to be the two most reliable parameters for diagnosing VGI by FDG-PET/CT, albeit we generally found better specificity for SUVmax [1]. This discrepancy may relate to the studies included, e.g., Rojoa et al. also included stand-alone FDG-PET which intuitively would provide poorer specificity. Similar to our results, TBR was also considered as the inferior method by Rojoa et al., e.g., sensitivity and specificity of 57% and 78%, respectively.

Reinders et al. found that white blood cell scintigraphy (WBC) including single-photon emission computed tomography (SPECT/CT) had the highest accuracy in diagnosing VGI, but this modality is outside the scope of this paper and is not addressed further. They did, however, suggest FDG-PET/CT to be used as the initial imaging technique due to its more widespread accessibility [2].

Conclusion

Due to the high risk of a fatal outcome of VGI, the importance of an available and accurate non-invasive diagnostic modality cannot be stressed enough. One should at least aim for a diagnostic procedure with a high NPV to avoid unnecessary high-risk surgery. Microbiology has been viewed as gold standard in many studies, but it is an invasive procedure with risk of complication and risk of false negatives due to early/prior antibiotic treatment. FDG-PET/CT seems promising, especially when interpretation is based on focal versus diffuse FDG-uptake patterns.

As risks of bias were encountered in several of the included studies, larger prospective studies with strictly defined interpretation criteria and predefined use of index test and reference standard (including time intervals between them and antibiotic usage) are still necessary to more firmly establish a non-invasive golden standard for diagnosing VGI, including the current method of choice, CT, which by other recent studies has been found inferior to FDG-PET/CT.