Introduction

Head and neck cancer is the sixth most common malignancy worldwide [1]. About 60% of patients are diagnosed with locally advanced disease. Treatment of these patients usually consists of a combination of radiotherapy with chemotherapy or targeted agents [2, 3]. For many years, planned neck dissection and radiotherapy was the preferred treatment for patients with nodal involvement [4]. Nowadays, neck dissection is no longer routinely performed in patients with a complete response after chemoradiation of initially N1 disease, and is more and more questioned in patients with initial N2-N3 disease at diagnosis [5]. The recurrence rate for node positive disease is still about 35–45%, and most nodal relapses occur within 2 years after treatment [6, 7]. Therefore, the use of reliable imaging methods for identifying residual nodal disease allowing for timely salvage surgery may support the waning use of neck dissections in patients with a complete response. However, this remains a matter of debate as others argue that neck dissection improves disease control regardless of the response in the neck [8].

Functional imaging with 18F–fluorodeoxyglucose (FDG) positron emission tomography (PET) is less hampered by the altered anatomy compared to CT and MRI, and has proved to be valuable in distinguishing residual nodal disease from therapy-induced changes. Previous meta-analyses have established the favorable diagnostic test characteristics of FDG-PET and FDG-PET/CT for evaluating residual nodal disease, even though considerable heterogeneity was present between studies [9, 10]. However, the technique has improved considerably since its introduction due to technical advances enabling integration of PET and CT devices, improvements in detector capabilities yielding higher image resolution, optimization of head and neck images acquisition parameters, and combining PET with diagnostic resolution contrast enhanced CT scans [11,12,13,14,15,16,17,18,19]. Likewise, the identification of the human papilloma virus (HPV) as an important cause and prognostic factor of oropharyngeal cancers has greatly increased the understanding of the underlying tumor biology. Taken together, these recent advances warrant a reappraisal of the current literature focusing on integrated FDG-PET/CT in HNSCC within 6 months after treatment, with special emphasis on factors that may impact the technique (scanning parameters, patient characteristics, HPV status and follow-up).

Methods

Studies were identified through a systematic electronic search of PubMed and Web of Science databases on 14/11/2016. The following search criteria were used: “(pet/ct or pet-ct or (“Positron-Emission Tomography” and “Tomography, X-Ray Computed”)) and (“head and neck” or “Head and Neck Neoplasms”) and (response or evaluation or therapy or surveillance)”. We also examined the references from all studies that were retrieved in full text. Abstracts or posters were excluded.

The eligibility criteria were: 1) the use of FDG-PET/CT for the detection of recurrent/residual nodal disease within 6 months after therapy (based on the reported central tendency measures), 2) patients with squamous cell carcinoma of the head and neck treated with radiotherapy with or without chemotherapy or targeted agents (surgery prior to radiotherapy was allowed as used in oral cavity tumors), 3) 2 × 2 tables should be extractable for the neck nodes. Simulation was used to estimate the proportion of patients who had PET/CT imaging beyond 6 months after the end of treatment. Briefly, with the reported or estimated mean and standard deviation from the individual studies, the normal distribution was used to generate datasets with the same size as the original studies and the proportion of simulated interval times exceeding 6 months was calculated [20]. After 50,000 iterations, the 99% confidence interval of this proportion was calculated and the upper boundary was used to estimate the potential impact of these patients on the analysis results. Studies with less than 15 patients, patients below 18 years or studies in languages other than English or French were excluded. Study selection was performed by the first author, but in case of doubt, a consensus was sought among all authors.

For every study, the following variables were extracted: patient demographics (age, sex), disease characteristics (nodal status, HPV status, initial vs. recurrent disease), study characteristics (retro- vs. prospective), number of patients, treatment, follow-up, reference standard, prevalence, and imaging protocol (scanning parameters, number of weeks after end of treatment, use of contrast enhanced CT scans and acquisition of dedicated head and neck images). In case of missing data an attempt was made to contact the investigators for further information. If HPV prevalence was not reported, the country-specific HPV prevalence was estimated using data from Stein et al. [21]. Using the reported or estimated prevalence of HPV and the number of patients with an oropharyngeal malignancy, the percentage of patients with an HPV associated malignancy relative to the entire study population was estimated for every study.

The QUADAS 2 (quality assessment tool for diagnostic accuracy studies) system was used to assess the methodological quality of selected studies, but was not used to exclude studies [22,23,24].

From the individual studies, the reported 2 × 2 tables of FDG-PET/CT outcomes versus the reference standard were extracted to allow the bivariate estimation of the pooled sensitivity and specificity. This method uses a random effects model for both sensitivity and specificity to compensate for observed heterogeneity beyond chance caused by varying clinical and methodological aspects of the selected studies. Also, this approach adjusts for any differences in study size and the possible negative correlation between sensitivity and specificity of FDG-PET/CT that may be caused by varying thresholds in image interpretation. Model diagnostics were performed as required [25, 26]. From the pooled estimates of sensitivity and specificity, the mean positive and negative likelihood ratios were calculated. A hierarchical summary receiver operator characteristics curve (HSROC) was constructed to estimate the pooled area under the curve (AUC) of FDG-PET/CT.

Small study effects (e.g. publication bias) were assessed by constructing a scatter plot of the inverse of the square root of the effective sample size versus the log of the diagnostic odds ratio. Meta-regression was performed if more than 10 studies were identified and if the overall I2 exceeded 50%.

Results

Systematic review

A total of 1483 references were identified, of which 1417 were discarded based on the title or abstract, because they were not related to the evaluation of FDG-PET/CT in head and neck cancer (Fig. 1). Another 45 studies were discarded based on the abstract, as they did not meet the eligibility criteria. An additional 3 were excluded because they were not in English or French. In all, 66 references were retrieved in full text and analyzed for potential inclusion. One additional study, performed at our hospital, was included in this analysis [27].

Fig. 1
figure 1

Flow chart of the study for the selection of the systematic review

In all, 22 studies reporting on a total of 1423 patients were included in this analysis (Table 1) [11,12,13, 15,16,17, 19, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]. The mean prevalence of oropharyngeal tumors was 70.3% and 6 out of 22 studies (27.2%) reported the prevalence of HPV. The median percentage of HPV positive patients in the included studies was estimated to be 38.5%. The primary and secondary endpoints of the included studies were diverse and included the evaluation of the diagnostic performance of FDG-PET/CT, the comparison of FDG-PET/CT with conventional imaging techniques, the ability to avoid neck dissections, or assessed which patient population would derive most benefit from a post-treatment FDG-PET/CT scan. While the reported medians and means of the time between the end of treatment and FDG-PET/CT imaging was well below the cut-off of 6 months for all included studies, the upper ranges for this variable crossed the 6 month threshold in 3 studies [29, 30, 36]. However, it was estimated that less than 0.5% of patients in the pooled dataset were scanned beyond 6 months and it was considered justified to keep these studies in the analysis.

Table 1 Characteristics of the studies

A detailed overview of the QUADAS 2 scores is reported in the supplementary appendix (Table S1). Briefly, there was a moderate risk of bias due to the exclusion of patients (“flow and timing” domain). Also, confirmation of positive nodal disease at the time of diagnosis and prior to therapy remains an important weakness in most included studies (“patient selection” domain), which may hamper answering the research question of this review (“applicability”).

Meta-analysis

A funnel plot was created to assess the presence of publication bias (Fig. S2), revealing no evidence of small study effects (p = 0.56). Crude pooled estimates of sensitivity, specificity, positive and negative likelihood ratio, diagnostic odds ratio, and AUC were 82% (95% CI 72–89%), 92% (95% CI 87–95%), 10.3 (95% CI 5.9–18.1), 0.19 (95% CI 0.12–0.31), 54 (95% CI 22–131), and 0.93 (95% CI 0.90–0.95). However, model diagnostics identified the studies of Gourin et al. and Vainstein et al. as outliers (Cook’s distance of 1.98 and 2.22, respectively) [11, 40]. The meta-analysis was subsequently performed without these studies, yielding a total of 20 studies and 1293 patients [12, 13, 15,16,17, 19, 27,28,29,30,31,32,33,34,35,36,37,38,39, 41].

The pooled estimate without outliers of sensitivity, specificity, positive and negative likelihood ratio, diagnostic odds ratio, and AUC were 85% (95% CI 76–91%), 93% (95% CI 89–96%), 12.4 (95% CI 7.4–20.8), 0.16 (95% CI 0.10–0.27), 76 (95% CI 35–165), and 0.94 (95% CI 0.91–0.95) (Fig. 2). The negative and positive predictive value (NPV and PPV) are considered clinically most useful, yet depend on the probability of residual nodal disease, i.e. disease prevalence. Figure 3 represents a nomogram-style plot of the pooled estimate of NPV and PPV as a function of disease prevalence. Given a pre-test probability of disease of 10%, the NPV and PPV are estimated at 98% and 58%, respectively. Moreover, Fig. 3 demonstrates the important increase in positive predictive value (58, 76 and 84%) with increasing pre-test probabilities (10, 20 and 30%). In contrast, the negative predictive value remained rather stable (98, 96 and 93%) across a broad range of pre-test probabilities (10, 20 and 30%).

Fig. 2
figure 2

Hierarchical summary receiver-operating curve (HSROC) along with the summary operating point and 95% confidence and prediction region

Fig. 3
figure 3

The post-test probability of residual/recurrent nodal disease after a positive/negative FDG-PET/CT result with the prevalence of nodal disease set at 10, 20 and 30%. The red line shows that a patient will have a 58% chance of having residual nodal disease in the neck if the FDG-PET/CT scan is positive and 2% if the scan is negative if the pre-test probability is 10%

The heterogeneity in reported diagnostic test characteristics between the different trials was larger than can be explained by chance alone, both overall (I2 = 80%; 95% CI 56–100%; p = 0.004) and for specificity (I2 = 77%; 95% CI 66–87%; p < 0.001). A subsequent meta-regression was performed using all available studies (n = 22) in an attempt to identify contributing factors. Of all study quality associated factors, those related to the index test (e.g. timing, acquisition, and reporting of the FDG-PET/CT study) were found to be associated with reported specificity, with lower values in studies with a higher risk of bias (77% vs 94%; p < 0.005) or a lower degree of applicability (82% vs 93%; p = 0.01). With respect to patient related items, both a higher percentage of oropharyngeal tumors and HPV positive malignancies were associated with lower sensitivity (76% vs 87% and 75% vs 89%; both p = 0.01) and specificity (85% vs 96% and 87% vs 95%; both p < 0.005). Interestingly, the differential effect of HPV status on sensitivity and specificity remained when analyzed separately in studies with a prevalence of oropharyngeal tumors below (between HPV group difference in sensitivity = 9% and specificity = 9%) and above (between HPV group difference in sensitivity = 17% and specificity = 7%) the overall median. In contrast, within the HPV group, differences across anatomic sites did not exceed 6% for either sensitivity or specificity. This suggests that HPV status - rather than anatomic site - drives the observed differences in diagnostic test characteristics. No other patient- or disease-related parameters were found to impact the reported study characteristics (including patient characteristics, scanning parameters, study design and follow-up).

Discussion

This meta-analysis demonstrates that FDG-PET/CT is a reliable technique for detecting residual nodal neck disease within the first 6 months after treatment, with a pooled sensitivity and specificity of 85% (95% CI, 76–91%) and 93% (95% CI, 89–96%). In contrast to the meta-analyses of Isles and Gupta, this analysis excluded studies that imaged patients more than 6 months after therapy to increase the clinical applicability of the results [9, 10]. The most important clinically relevant finding was the high negative predictive value. A negative PET/CT scan is therefore highly indicative for the absence of disease obviating further therapeutic interventions. Recently the PET-NECK trial, a randomized phase III trial, compared FDG-PET/CT guided active surveillance with standard neck dissections in 564 patients with stage N2 and N3 disease [42]. The trial was able to demonstrate non-inferior survival outcomes compared to routine neck dissections, with only 20% of the patients in the surveillance arm receiving a neck dissection. The use of FDG-PET/CT surveillance also resulted in less serious adverse events and was cost-effective compared to routine neck dissection. While the PET-NECK study has provided remarkable proof-of-principle for FDG-PET/CT surveillance, it did not specifically address optimal image acquisition techniques and interpretation criteria, nor identify subgroups who would benefit most from FDG-PET/CT surveillance, and was limited to node positive disease only.

Impact of integrated multimodality imaging

Compared to the meta-analysis of Isles et al. based on standalone FDG-PET, the present review included only studies using integrated multimodality PET/CT scanners and found higher point estimates for sensitivity (85% vs 74%) and specificity (93% vs 88%), even though the 95% confidence intervals overlap (Table 2) [9, 10]. Fakhry et al. reported a higher accuracy for FDG-PET/CT than standalone FDG-PET, attributable to a higher specificity because of improved anatomical localization, which decreased the number of false positive and equivocal findings [43]. In the study of Goshen et al. FDG-PET/CT decreased the number of equivocal PET findings with 60% in the initial staging and evaluation of suspected recurrent HNSCC [44].

Table 2 Comparison of the current meta-analyses

In contrast, the meta-analysis of Gupta et al., which included studies on FDG-PET with and without CT, did not find significant differences between devices for the evaluation of the nodes in the neck (p = 0.8) [10]. Therefore, other advances may have contributed to the increased diagnostic performance over time, including optimized scanning protocols, technical advances in scanning equipment (e.g. TOF-PET, dedicated head and neck protocols), and the increasing experience of nuclear medicine physicians in this field. It remains an open question how much each factor has contributed to this improvement.

Effect of timing after therapy

The timing of FDG-PET/CT imaging after completion of therapy is important, as FDG-PET/CT may be less reliable when performed in the first weeks after treatment [9, 10]. The magnitude and dynamics of this effect have recently been elaborated in the study of Helsen et al. [27] which confirmed that the diagnostic performance increases until 11 weeks after treatment and reaches a plateau thereafter. However, the current analysis did not demonstrate an association between the number of weeks after therapy and the sensitivity or specificity of FDG-PET/CT. This may be due to ambiguous reporting of this parameter between the studies and to the low variability of this parameter in the current dataset (mean 12 weeks; 95% CI 10–13).

Impact of HPV status

Our results suggest a lower diagnostic performance of FDG-PET/CT in HPV positive tumors. At the extreme end, the study by Vainstein et al. reported a sensitivity and specificity of only 25% (3–65%) and 82% (73–90%), respectively [40]. Data reported by Moeller et al. show a similar trend with a lower accuracy of FDG-PET/CT in low risk patients, which included the HPV status [12]. Although there are distinct morphologic and glycolytic differences between HPV positive and negative tumors, FDG avidity of both cancers is comparable and cannot explain the difference in diagnostic performance [45,46,47]. HPV-positive patients have a better outcome compared to HPV-negative ones irrespective of the treatment choice, but are also more radiosensitive [48]. Therefore, repopulation of resistant cells may take longer before they can be detected by PET imaging, resulting in lower sensitivities early after the end of radiotherapy. The lower specificity can be explained by the increased cytotoxic T-cell based immune response reported in HPV-positive tumors, resulting in the presence of inflamed nodes that take longer to involute [49, 50]. The recent PET-NECK trial, however, did not find a difference in overall survival between surveillance and neck dissection, when analyzed separately based on HPV status [42]. While it is reassuring that the potentially lower sensitivity and specificity of FDG-PET/CT in HPV positive patients did not result in inferior survival, an alternate timing of surveillance may prove more appropriate in these patients. The latter would reduce post-treatment inflammation during FDG-PET/CT surveillance and could impact the number of neck dissections. Taken together, these results provide compelling evidence that the identification of HPV status as an important prognostic factor in oropharyngeal squamous cell carcinoma may have implications beyond optimal treatment selection, but may also affect subsequent imaging surveillance strategies.

Standardization of FDG-PET/CT

Issues relating to technical details of the FDG-PET/CT study were the other important source of heterogeneity in this analysis. The way of reading scans varied greatly, with some centers using standardized uptake value (SUV), while others only interpreted the images qualitatively with or without comparison to a predefined background region (Table 1). Moreover, there is no consensus on the cut-off value used when performing a quantitative analysis. In an effort to standardize image interpretation, the Hopkins criteria have been proposed, using a 5-point response interpretation comparing the SUV value of the lesion with the SUV value of the liver and the internal jugular vein [51]. Implementing such a system may contribute to reducing the variability in reporting between centers and increase the interreader agreement. Similar efforts to standardize tracer uptake time, scan acquisition parameters, and reconstruction settings can be expected to optimize the PET/CT technique, as exemplified by the EANM EARL FDG-PET/CT accreditation programme [52].

Effect of pre-test probability

Even though the negative predictive value of FDG-PET/CT was very high across a broad range of prevalences, the positive predictive value was found to vary more widely with changing pre-test probability. This confirms the need for confirmation of positive findings on FDG-PET/CT scan by cytology or biopsy, or to proceed to neck dissection as was done in the PET-NECK trial [42].

Conclusion

FDG-PET/CT within the first 6 months post-treatment is reliable for detecting residual nodal disease in patients with HNSCC and a negative scan obviates the need for further therapeutic intervention. However, in HPV positive tumors FDG-PET/CT may be less reliable and the optimal surveillance strategy in this patient population remains to be determined. Also, further standardization of the PET technique is required.