Introduction

Curatively intended treatment in locally advanced head and neck squamous cell carcinoma often involves a combination of treatment modalities such as surgery and/or radiotherapy (RT) with or without chemotherapy. With an organ preservation approach, accurate evaluation of the treatment response is crucial, as salvage surgery can be offered when a resectable, residual tumour is detected. Different imaging modalities can be used for response assessment following RT [1, 2]. During recent years, a number of studies have focused on the evaluation of neck node response. FDG PET/CT has been shown to provide more accurate assessment of treatment response than contrast-enhanced CT alone [3, 4]. Studies using PET/CT have demonstrated high negative and positive predictive values (NPV, PPV) with respect to persistent tumour after RT [5, 6].

FDG uptake over time corresponds to glucose uptake in the tissues and FDG PET scans can be evaluated either quantitatively or qualitatively. The most accurate way of analysing the metabolism of a tumour is to calculate the metabolic rate of glucose (MRglu) using either non-linear regression [7] or Patlak analysis [8]. Both methods are able to quantify the metabolic activity of the tumour by measuring the FDG activity in tissue over time, normalized to the integrated activity of the substance in plasma. However, multiple blood samples are required and the methods are laborious and not clinically feasible. Semiquantitative ways of assessing tumour metabolism, though less accurate, are in clinical use. The standardized uptake value (SUV) is a semiquantitative approach to assess metabolism in which FDG activity in tissue is normalized for the injected dose of FDG and the patient’s body mass. Several ways of determining SUV are in use; maximum SUV (SUVmax) is the most commonly used parameter [9]. SUV shows limited value for assessing treatment response and predicting outcome in the individual patient [10]. The more recently introduced parameter metabolic tumour volume or total lesion glycolysis may be more promising [11].

Visual inspection of the relative difference between the tumour metabolism and the metabolism of the surrounding, normal tissue is usually adequate to identify tumours and determine treatment response [12]. Although this is clinically valid there is a lack of standardized criteria for visual inspection to enable comparisons between patients and between studies. In order to overcome this problem, different Likert scales are used in evaluation of PET scans. The Deauville criteria Likert scale is used for qualitative evaluations of FDG metabolism in Hodgkin lymphoma, and has a reproducible and excellent predictive ability in patients undergoing chemotherapy treatment [13]. In other tumours, such as head and neck cancer, only a few studies have produced data on clinically convenient and reproducible qualitative interpretation criteria for FDG metabolism [14]. The Deauville criteria are based on the relative metabolic activity in the tumour compared with the mediastinal blood pool and liver. The prognostic value of a PET scan is challenged when FDG uptake is reported as equivocal, i.e. when the metabolism of a lesion is close to that of the surrounding normal tissue. Whether this represents responding tumour tissue and a good prognosis or a patient at risk for residual disease is of clinical importance and also a relevant clinical PET problem. In longitudinal PET studies of Hodgkin lymphoma, equivocal scans have proved to represent a good, rather than bad, prognosis [15].

The objective of the present study was to determine whether early post-RT PET scans, visually assessed as equivocal regarding neck node metabolism, can be used to accurately categorize patients as responders or nonresponders using a Likert scale and/or SUVmax. Other aims were to determine the performance of different methods for assessing post-RT PET scans (visual inspection, a Likert scale and SUVmax) and to establish whether any method is superior in predicting regional control (RC) and overall survival (OS).

Materials and methods

The present study was a retrospective re-evaluation of PET scans from patients who were included in a previously reported prospective study performed at a tertiary referral hospital [16]. The report of that study includes a thorough description of the materials and methods. In the previous study a PET-directed neck policy was evaluated. Patients were enrolled from 2009 to 2012 and provided informed consent, and the study was approved by the regional ethics committee. Patients with node-positive squamous cell carcinoma of the larynx, hypopharynx and oropharynx, scheduled for organ preservation therapy with curative intent, and with no initial neck surgery, were included in the study. Patients were only eligible if the neck nodes demonstrated hypermetabolism on a PET scan performed before treatment. All 105 eligible patients underwent a baseline PET/CT scan (PET1) during the 2 weeks before the start of RT. Another PET/CT scan (PET2) was performed 6 weeks after completion of RT. The median time from completion of RT to PET2 was 43 days (range 34–87 days); 90 % of the patients had their PET2 scan between 37 and 52 days after therapy.

The PET result determined the management of the neck, i.e. the decision regarding neck dissection versus observation. If PET2 was negative, another scan was scheduled 12 weeks later (PET3). The multidisciplinary team ultimately decided whether patients with an equivocal PET2 scan would undergo neck dissection or be scheduled for PET3. The decision was based on a combination of prognostic factors including initial N classification, human papillomavirus (HPV)/p16-status (p16 was sometimes used as a surrogate marker for HPV infection), performance status and also the clinical response to RT. If PET3 was considered positive or equivocal the patient underwent neck dissection and if negative, the patient was scheduled for follow-up according to routine clinical practice. Clinical follow-up examinations were scheduled every 3 months during the first 2 years after either neck dissection or negative PET3, every 4 months in year 3 and every 6 months during the last 2 years of follow-up.

Patients

A total of 105 patients underwent PET1 and PET2, and 86 patients were also scheduled for PET3. For the Likert scale assessment, 104 patients were eligible for analysis of PET2. Adequate scanning including determination of the mediastinal blood pool was missing in one patient. Patient demographics and tumour characteristics are presented in Table 1. All included patients had complete clinical data and follow-up until December 2014. The median follow-up time was 42 months (9–67 months).

Table 1 Characteristics of the 105 patients and 107 tumours. The majority of patients had p16-positive, oropharyngeal cancer, T2N2. Synchronous head and neck primary tumours were found in two patients

PET imaging

In accordance with our standard clinical PET protocol, the patients were injected intravenously with FDG 4 MBq/kg body weight to a maximum dose of 400 MBq after a 4-h fasting period. All patients were imaged with an integrated PET/CT system (Philips Gemini TF; Philips Medical Systems, Cleveland, OH). After a 1-h uptake period at rest, images were acquired for 2 min per bed position. At baseline, the treatment planning CT scan was acquired together with the PET scan. For follow-up studies, low-dose CT scans (50 mAs) were used. The CT scan was used for attenuation correction and anatomical localization. The PET1 scan was acquired from the vertex to the upper thighs. The subsequent PET scans were acquired from the vertex to the upper abdomen.

Image interpretation

As described in the previous prospective study, all PET2 scans were visually evaluated as part of the clinical routine by nuclear medicine physicians regarding metabolic response [16]. For the purpose of the present study, the clinical reports were categorized as “low metabolism”, “remaining hypermetabolism” or “equivocal”. In the present study two experienced readers, without knowledge of the clinical outcome, re-evaluated all patients regarding metabolic neck node response. Images were read and SUVmax levels obtained using an Extended Brilliance Workspace workstation (V4.5.3.40140; Philips Healthcare, The Netherlands). Metabolic responses were scored according to the Deauville criteria [17]. The Deauville criteria are given in Table 2 together with the categories used in the present study. Examples of corresponding PET images are shown in Fig. 1. If FDG uptake was found to have remained in the neck nodes, the highest uptake was scored.

Table 2 The Deauville criteria and interpretation into categories regarding the neck node response to treatment
Fig. 1
figure 1

PET images and corresponding Deauville scores. Score 1 was not used in this study and is not shown

Definitions and statistical considerations

A complete response to RT in the neck and RC was defined as no residual or recurrent tumour involving the neck after completion of RT until the last date of follow-up. A residual tumour in the neck was characterized as persistent tumour, according to the pathology report, after neck dissection scheduled as a result of the PET2 or PET3 scan. Neck relapses that occurred after PET3 in patients who achieved a complete remission prior to this were categorized as recurrences. The Deauville criteria Likert scale was dichotomized into responders (scores 2 and 3) and nonresponders (scores 4 and 5). The SUVmax was dichotomized correspondingly, with a cut-off value of 2.0.

A PET scan demonstrating “low metabolism”, SUVmax ≤2.0 or a Deauville score of 2 or 3 was considered true-negative if the patient had neither any tumour, according to the pathology report after neck dissection, nor any neck relapse during follow-up. In false-negative PET scans, either residual tumour was found or a relapse occurred during the follow-up period. Scans reported as showing “remaining hypermetabolism”, SUVmax >2.0 or a Deauville score of 4 or 5 were considered true-positive if neck dissection revealed residual tumour, according to the pathology report, or if progressive neck disease was seen in patients with non-operable disease during the clinical follow-up. In false-positive PET scans, no residual tumour, according to the pathology report, was found in the neck dissection specimen.

Differences between groups were evaluated by Fisher’s exact test or the chi-squared test for categorical variables and the Mann-Whitney U test for continuous variables. RC and OS were estimated using the Kaplan-Meier method and groups were compared using the log-rank test. Time was calculated from the date of diagnosis. Receiver-operating characteristic (ROC) curve analysis was performed for the Likert scale, and SUVmax and the ROC data were used to determine cut-off values. All tests were two-sided and a p value of <0.05 was considered statistically significant. Where applicable, 95 % confidence intervals (CIs) were used. The software package IBM SPSS statistics 20 (SPSS Inc., Chicago, IL, USA) was used for the statistical analysis.

Results

Outcome

The median follow-up time from the date of diagnosis was 42 months (range 9–67 months). The OS at this time-point was 83.7 %. As expected, there was a significant difference in OS between patients with p16-positive and patients with p16-negative tumours (94.7 % and 52.3 %, respectively; p < 0.00001). During follow-up, 28 patients had recurrent tumour at the primary site (T site), in the neck (N), at a distant site (M) or at a combination of sites (Table 3). Residual neck tumour after treatment completion or recurrence in the neck was seen in 23 patients.

Table 3 Recurrences during the follow-up period, n = 28

Metabolic response and regional tumour control

A significant difference in RC was seen for every single category of visual evaluation, i.e. low metabolism”, “remaining hypermetabolism” and “equivocal” (p < 0.0005). A significant difference was also found for each of the Deauville criteria (p < 0.00001), and the Deauville criteria dichotomized into responders (score 2 or 3) and nonresponders (score 4 or 5; p < 0.0005). Likewise, a significant difference in RC (p = 0.0005) between patients with SUVmax ≤2.0 and those with SUVmax <2.0 (Fig. 2).

Fig. 2
figure 2

Regional control in relation to a visual inspection, b dichotomized Deauville criteria, and c maximum standardized uptake value (SUVmax)

The ROC analysis established that the Likert scale and SUVmax were similar in their ability to correctly identify the level of metabolism that correlated with regional tumour control, with areas under the curve (AUC) of 0.81 (CI 0.70–0.91) and 0.78 (CI 0.67–0.88), respectively. None of the evaluation methods was able to significantly predict OS or disease-specific survival. These results seem to be valid independently of p16 status. However, the small number of patients did not allow statistical analysis.

Sensitivity, specificity, PPV, NPV and accuracy for the dichotomized Likert scale were 47.8 % (CI 27.4–68.9 %), 93.8 % (CI 85.6–97.7 %), 68.7 % (CI 41.5–87.9 %), 86.4 % (CI 77.0–92.5 %) and 83.6 % in predicting neck node control after RT. Corresponding values for SUVmax were 82.6 % (CI 60.5–94.3 %), 61.0 % (CI 49.5–71.4 %), 37.3 % (CI 24.5–51.9 %), 92.6 % (CI 81.2–97.6 %) and 65.7 %. If the visual inspection was dichotomized by considering the equivocal PET scans together with the visually positive PET scans (sensitive reading), the sensitivity, specificity, PPV, NPV and accuracy for visual inspection were 60.9 % (CI 38.8–79.5 %), 80.5 % (CI 70.0–88.1 %), 46.7 % (CI 28.8–65.4 %), 88.0 % (CI 78.0–94.0 %) and 76.2 %. Corresponding values considering the equivocal PET scans together with the visually negative PET scans (specific reading) were 34.8 % (CI 17.2–57.2 %), 96.4 % (CI 89.0–99.1 %), 72.7 % (CI 39.3–92.7 %), 84.0 (CI 74.7–90.5 %) and 82.9 %.

Metabolic response and remission in the neck after radiotherapy

All methods were able to significantly predict metabolic response and neck node remission: p = 0.00002 for visual inspection, p = 0.000007 for the Deauville criteria with dichotomization and p < 0.000002 without dichotomization, and p < 0.0005 for SUVmax. The median value of SUVmax was 1.9 (CI 1.7–2.0) in patients with a complete response and 2.6 (CI 2.0–3.9) in patients with remaining or recurrent tumour.

PET scans judged as equivocal

Of the PET2 scans, 19 were reported as equivocal regarding neck nodes and seven patients underwent neck dissection 3–4 weeks after the scan. According to the pathology report only one patient had remaining tumour cells in the neck specimen and was soon diagnosed with distant metastases as well. The remaining 12 patients were scheduled for PET3 18 weeks after RT and were all assessed as complete responders, although in five of these patients a neck node recurrence developed during the follow-up period. In total, 13 of the patients with equivocal PET scans maintained RC. The tumour was HPV/p16-positive in 14 and HPV/p16-negative in 5 of these 19 patients, a distribution that was representative of the whole study group.

Scans judged as equivocal, and the Likert scale

When applying the Likert scale to scans judged as equivocal the separation of responders from nonresponders was enhanced but no significant difference regarding persistent or recurrent tumour was found between responders with a score of 2 or 3 and nonresponders with a score of 4 or 5 (p = 0.07; Fig. 3).

Fig. 3
figure 3

PET scans judged as equivocal by visual inspection in 19 patients categorized as responders with complete remission in the neck (Deauville score 2 or 3), or as nonresponders with residual or recurrent tumour in the neck (Deauville score 4 or 5)

In 12 of the 13 patients with a lasting complete response, a score of 2 or 3 was recorded. In the remaining patient a score of 5 was recorded but no malignancy was found in the neck node specimen. The SUVmax in this patient was also high (7.1). Among the six patients who failed to achieve regional remission, a score of 4 or 5 was recorded in three and a score of 3 in the other three. The latter three patients included two with a tonsillar tumour in whom locoregional recurrences were diagnosed 9 and 19 months, respectively, after completion of therapy and one with a supraglottic tumour with a neck recurrence who underwent successful salvage surgery 12 months after therapy.

In the group of PET scans judged as equivocal there was no correlation between SUVmax and the presence or recurrence of tumour (p = 0.28). The ROC analysis established that the Likert scale was superior to SUVmax in correctly identifying the level of metabolism that correlated to regional tumour control, with an AUC of 0.82 (CI 0.65–0.99) for the Likert scale, compared with 0.67 (CI 0.41–0.93) for SUVmax (Fig. 4).

Fig. 4
figure 4

ROC analysis of the 19 PET scans judged as equivocal by visual inspection. The Deauville criteria Likert scale is superior to SUVmax in identifying the level of metabolism that correlates with regional tumour control

Discussion

In the present study we found a good concordance between visual inspection, the Deauville criteria Likert scale and SUVmax. All methods of assessment predicted RC with high significance and almost equally well. The Likert scale also showed encouraging results in discriminating responders from nonresponders on PET scans judged as equivocal. SUVmax is an estimate of tumour metabolic activity and the most commonly used SUV parameter in PET/CT [9]. It is a single-voxel value adversely affected by noise [10] and does not reflect the metabolic activity of the tumour as a whole. We were able to demonstrate a significant difference in SUVmax between patients with and patients without remaining/recurrent tumour after completion of RT with or without chemotherapy. However, measurement of SUV is affected by technical, biological and physical factors [18], and despite an attempt to establish common criteria there are still many different ways of calculating and presenting SUV. Published cut-off values are specific to the method and to the institution [9]. In this study, as also shown previously, SUV did not add any extra value compared with visual inspection in the clinical setting [12, 19].

A recently published study focused on a triphasic delayed measurement of SUVmax with the main aim of reducing the number of false-positive and equivocal scans after therapy [20]. The authors presented different FDG kinetics in inflammatory lesions compared with residual tumour 12 weeks after therapy completion and were able to improve the PPV. One benefit of the dynamic PET study would be that each patient served as his or her own control, which would reduce interinstitutional variations.

There is so far a lack of consensus on how to qualitatively assess and report PET scan results. In Hodgkin lymphoma the Deauville criteria Likert scale has been validated and adopted in multicentre trials [13, 17, 21]. In 48 patients with head and neck cancer, Krabbe et al. used a five-point scale in a serial PET evaluation, 3, 6, 9 and 12 months after treatment and demonstrated an overall PPV of 51 % and a NPV of 100 % using routine clinical follow-up as the reference [22]. Marcus et al. recently introduced and validated the Hopkins criteria, a five-point scale very similar to the Deauville criteria [14]. They achieved substantial interreader reliability and the criteria corresponded to OS and showed a PPV of 71.1 % and a high NPV of 91.1 %. The corresponding figures in our study were 68.7 % and 86.4 %. In a prospective PET study, Porceddu et al. based qualitative interpretation on focal uptake assessed in relation to uptake in adjacent tissue and the liver [3]. The PET scans were grouped into three different categories: “positive” (for remaining tumour), “negative” and “equivocal”. In the long-term follow-up, NPV was as high as 97.1 % even though all recurrences, no matter when they occurred, were included in the “false-negative” group [5]. As mentioned above, the NPV of the Likert scale used in the present study was lower (86.4 %). This may, at least partly, have been due to the early scheduled PET scans (6 weeks after therapy) as compared with 12 weeks or the study by Marcus et al. in which more than 50 % of the patients had their PET scan scheduled 13–24 weeks after treatment. Recent studies and meta-analyses have demonstrated a higher accuracy of PET scans scheduled later than 7 weeks after treatment [23, 24]. Treatment regimens also differed. In our setting, single modality treatment with RT was used in >90 % of the patients in contrast to similar studies in which chemoradiotherapy is more frequent. Less use of concurrent chemoradiotherapy might have led to more regional recurrences. In the present study the PET results were also categorized as false-negative if recurrent cancer was found at any stage during the follow-up period in contrast to the studies by Krabbe et al. and Marcus et al. in which a 6-month limit for false-negative scans was applied [22, 14].

As reported in the Results, no posttreatment PET scan scored 1 on the Deauville scale. Score 1 is defined as “no FDG uptake”, which is a category of limited clinical value. A likely cause of no FDG uptake could be complete necrosis of the neck node, but this was not encountered in our patients. Cut-off values can be determined in different ways. We chose to use ROC data to define cut-off values for SUVmax and the Likert scale. Regarding SUVmax a cut-off value of 2.0 was the most favourable, giving a PPV of 37 % and a NPV of 93 %. Cut-off values used in previous studies vary but are often between 2.5 and 3.0; however, in our setting this range was less predictive. The outcome analysis is highly dependent on the chosen cut-off value. From a clinical point of view we consider a high NPV preferable to a high PPV, and the cut-off values were determined based on this preference.

The matter of equivocal PET scans has been debated and the difficulties acknowledged elsewhere [25, 26]. The Likert scale and visual inspection with specific reading (i.e. equivocal PET scans regarded as negative) showed similar accuracy. However, the benefit of adding a Likert scale to the PET report is apparent as responses can be categorized, and are distinct and easily interpreted by the oncologist or head and neck surgeon who have to act upon the PET result. In the present study, 18 % of scans (19 of 105 PET2 scans) were clinically reported as equivocal by visual inspection. The number of equivocal scans is similar to that found in previous studies [27]. By considering the equivocal PET scans together with those scored as 2 or 3 (responders) or those scored as 4 or 5 (nonresponders), we were able to correctly categorize 15 of the 19 patients. In three of the four remaining patients, categorized as responders, we diagnosed recurrences >9 months after therapy completion. In the fourth patient, we cannot explain why a visually clearly positive PET scan, a Deauville score of 5 and SUVmax >7 did not correspond to a positive pathology report. The group of PET scans assessed as equivocal was small and statistics should be interpreted with caution. However, the Likert scale adequately categorized 79 % of the equivocal PET scans, which is encouraging. In the group of PET scans judged as equivocal, SUVmax provided no additional predictive value.

In this study we chose to focus on the neck nodes with the highest FDG uptake in relation to treatment response. We did not evaluate the primary site response but it would be of interest to investigate whether using the Likert scale could minimize the number of equivocal scans here as well [28].

Conclusion

A Likert scale, in this case the Deauville criteria, for assessment of FDG metabolism in neck nodes following organ preservation therapy in head and neck squamous cell carcinoma is a promising tool to reduce to a minimum the problem of PET scans judged as equivocal. All patients with a posttreatment PET scan scored as 4 or 5 on the Likert scale are at great risk of harbouring remaining tumour cells and should be scheduled for neck dissection. There was an association between visual inspection, SUVmax and the Deauville criteria in the evaluation of the neck node response after RT. All methods adequately identified responders and nonresponders and predicted RC. A Likert scale seems to satisfy the requirements for a common qualitative way of assessing PET scans and also the reporting of PET results in the evaluation of head and neck cancer treatment. Consensus regarding qualitative assessment would facilitate PET reporting in clinical practice and enable comparative studies between patients and institutions.