Introduction

Head and neck carcinomas constitute approximately 5 % of all malignancies, and their incidence is growing worldwide [1]. The majority of these tumours are squamous cell carcinomas (SCCs) [2], and selection of the optimal therapeutic strategy requires accurate staging and localization by carefully assessing the soft tissue infiltration and pathological lymph node involvement.

Accurate T and N staging enables optimization of treatment planning (radiotherapy and/or surgery as appropriate) and, therefore, improves the outcome of patients with head and neck SCC. Although many of these tumours respond to radiotherapy, lymph node metastasis has a significant impact on prognosis and choice of treatment [3, 4]. Currently, patients with nodal involvement but no distant metastasis (M0) are treated with radiotherapy or lymphadenectomy, with or without chemotherapy.

Morphological assessment is carried out by either CT or MRI, with the latter being the method of choice for tumours of the head and neck due to its high anatomical, spatial and contrast resolution [5], which makes it particularly advantageous for evaluating trans-compartmental extensions with respect to CT.

However, the diagnostic accuracy of CT in tumour staging can be significantly enhanced by the integration of [18fluoro-deoxy-glucose FDG] PET imaging (18FDG-PET/CT) [6, 7], and PET/CT is very sensitive in the detection and evaluation of the degree of infiltration of the primary or metastatic tumours of the head and neck region [8]. Moreover, it has a crucial role in identifying and distinguishing between residual/recurrent disease and post-actinic changes [9]. Nevertheless, in addition to low spatial resolution (mainly for lesions smaller than 4 mm) [10], the functional method has other disadvantages, such as the possibility of incurring false negatives [11] (given the low glucose metabolic of necrotic tissues) and false positives [12] (inflammatory processes). Furthermore, the physiological uptake of 18FDG in the brain impedes proper assessment of skull base infiltrations by such tumours.

Moreover, it is often difficult to differentiate between metastatic and reactive non-metastatic lymph nodes with either PET/CT or MRI imaging [1012]. Therefore, some authors [3, 1318] have proposed multimodal imaging evaluation, by fusing (PET-MRI scanner or retrospective PET-MRI fusion software) or combined interpretation of PET/CT and MRI images (PET-MRI), as a means of improving diagnostic accuracy.

To extend this line of research, we set out to assess the additional diagnostic value of multi-modal imaging evaluation, achieved by combining short tau inversion recovery (STIR) and diffusion weighted imaging (DWI) MRI sequences with 18FDG-PET/CT (through a specific software platform) in T and N staging of head and neck SCC with respect to each method on its own.

Materials and methods

We retrospectively enrolled 25 patients (age range 25 and 77 years) with head and neck SCC, who had undergone MRI and PET/CT before any treatment between March 2010 and January 2013. All had also previously undergone CT/US of the head and neck and fibroscopy with biopsy, which showed SCC of the following grades in all patients (Table 1):

Table 1 Patients, tumour grading (G2 = moderately, G3 = poorly, G4 = undifferentiated) and localization (N = nasopharynx, O = oropharynx, H = hypopharynx)
  • G4, undifferentiated carcinoma, n = 6

  • G3, poorly differentiated carcinomas, n = 11

  • G2, moderately differentiated, n = 8.

The sites of the primary tumours were the rhinopharynx (n = 14), oropharynx (n = 8), and hypopharynx (n = 3) (Table 1), and 34 lymph node stations were assessed. The average time interval between MRI and PET/CT examinations was 12 days.

Informed consent was obtained from all participants. As the study was retrospective, approval of the local ethics committee was not sought.

Imaging technique

MRI was performed using a 1.5 Tesla scanner (Achieva, release 2.4.5, Philips Medical Systems, The Netherlands) and a head and neck coil (Synergy Multichannel Sense, Synergy). The MRI protocol (summarized in Table 2) included Turbo Spin Echo (TSE), STIR and DWI sequences. T1 sequences with adipose tissue signal suppression were acquired after contrast medium injection (Gadovist® 1 molar, administered intravenously at a dose of 0.1 ml/kg). Only STIR and DWI sequences were considered for the purposes of this study, in which isotropic DW images were reconstructed for b = 0 s/mm2, b = 400 s/mm2 and b = 800 s/mm2; ADC maps were automatically calculated from b = 800 s/mm2 and b = 0 images by the scanner’s software. Mean ADC maps, measured on an axial section, were generated as follows: (a) four independent 10 mm2 circular region of interest (ROI) measurements, considering only the lowest value (minimum ADC); (b) including pathological tissue; and (c) excluding cystic/necrotic regions and calcifications [4].

Table 2 MRI protocol used in the study

PET/CT scans were acquired on a hybrid Siemens (Siemens, Erlangen, Germany) system consisting of a lutetium oxyorthosilicate PET scanner (HI-REZ) with Pico-3D electronics and a 16-row CT device (Somatom Sensation 16). Imaging was performed 1 h after intravenous administration of 3.5 MBq/kg of 18FDG under fasting conditions, and multiplanar sections were reconstructed after attenuation and scatter correction (128 × 128 pixel size 5.3, 2 mm slice thickness, two iterations with eight subsets).

Image analysis

We created a score sheet for each patient bearing their identification code (ID), the dates of the PET/CT and MRI exams, evaluation grids of T and N parameters, the imaging techniques being evaluated (STIR, DWI, PET/CT, PET-MRI), and later study data and histopathology-derived tumour grades. Images from PET/CT and MRI were interpreted by a nuclear medicine specialist (GMS, 10 years of experience in PET-CT) and a radiologist (AS, 11 years of experience), respectively, each blinded to the results of the other imaging modalities, each other’s interpretation, and histopathology findings.

The T parameter was evaluated taking into account the region of origin of the tumour (supra-hyoid or sub-hyoid), axial location (right, left, bilateral, median), and the presence or absence of trans-compartmental extension. Tumour invasion was assessed according to the following radiological criteria: (a) unilateral change in anatomy with respect to the normal contralateral side, (b) nodular or infiltrative abnormal tissue signals, and (c) appearance of mass effect or fat displacement [19].

N staging was performed by mapping the neck lymph nodes on a grid, assigning values provided by the radiologist and nuclear medicine physician. Lymph nodes were considered positive with all/one or more of the following: (a) a short axis measurement greater than 1 cm (measurements were reported only on STIR images); (b) high signal intensity at b 200, 400, 800 s/mm2 and low ADC values; (c) evidence of coalescent lymph nodes or masses; and (d) lymph nodes present in regions where usually there are no lymph nodes. The presence of central necrosis was considered a sign of malignancy, regardless of the size of the lymph node (i.e., Rouviere lymph nodes) [1923]. On PET/CT, any focal 18FDG uptake greater than background activity and corresponding to nodular structures on CT was considered abnormal, regardless of lymph node size. An initial double-blind analysis of the STIR, DWI and PET/CT images was performed using a semi-quantitative approach, on a scale from 0 (absolutely negative) to 2 (positive).

Subsequently, a data set of co-registered multimodal images for combined reading was prepared by a medical physicist. Co-registration of the PET/CT and MRI images was performed on a Multimodality Workstation “Leonardo” (Siemens Medical Solutions) using the dedicated image analysis software in manual mode.

The co-registered PET-MRI images were then analysed by the two specialists by consensus. Lymph nodes were scored as follows: STIR-positive, 1 point; DWI-positive, 1 point; PET/CT-positive, 2 points. Lymph nodes with a score <2 were considered negative, and those with a score >2 positive. In the 10 cases in which the two methods were in total disagreement (score = 2) (Table 3), the positivity/negativity of each lymph node was subsequently assessed on the basis of a discussion between the two readers, who added or subtracted a score of 0.5, considering the morphology, size and characteristics of the lymph nodes on MRI and PET/CT.

Table 3 The scores of the combined evaluation (PET-MRI) in the 10 cases with disagreement between the two readers

In the 10 patients (40 %) who underwent surgery (excision of the tumour and/or lymph node dissection), the histopathology diagnosis was considered the gold standard. In the remaining 15 patients (60 %) who underwent radiotherapy, reference scores were given on the basis of: (a) remission/persistence of the signal alteration; (b) uptake/no uptake of the radioactive tracer at the site of a previous neoplastic focus; and (c) disappearance/persistence of lymphadenopathy and/or reduction of the volume at least of 50 %. Such evaluations were confirmed in at least two successive diagnostic follow-ups after radiotherapy, for a total time span of roughly 15 months.

Statistical analysis

Sensitivity, specificity, positive predictive values (PPV) and negative predictive values (NPV) were calculated for each diagnostic method by means of 2 × 2 contingency tables. Unlike N staging, in T staging, the diagnostic accuracy of MRI was calculated integrating STIR sequences with DWI images, because of poor anatomical details definition on DWI and ADC maps. Moreover, in two cases of nasopharyngeal tumour, susceptibility artifacts (metallic implants) degraded DWI image quality and impeded T staging. The inter-rater agreement between the MRI and FDG-PET/CT readers was assessed by means of Cohen’s k statistics. Agreement was defined as poor at k < 0.2, fair at k > 0.2 < 0.4, moderate at k > 0.4 < 0.6, good at k > 0.6 < 0.8, and very good at k > 0.8. A receiver operating characteristic (ROC) curve was used to evaluate the diagnostic value of ADC and SUV in the differentiation of benign from malignant lymph nodes, and we determined the ADC value threshold that provided the greatest accuracy in terms of discriminating benign from malignant lymph nodes. MedCalc software (MedCalc Software, Belgium) was used for all statistical analyses.

Results

Trans-compartmental extension

MRI assessment of trans-compartmental extension displayed 94 % sensitivity, 75 % specificity, 89 % PPV and 86 % NPV. This was compared with the 53 % sensitivity, 75 % specificity, 82 % PPV and 43 % NPV calculated for PET-CT, whereas PET-MRI displayed 93 % sensitivity, 88 % specificity, 94 % PPV and 88 % NPV (Table 4). Cohen’s k test showed that the agreement between PET/CT and MRI was “fair” (0.23).

Table 4 Sensitivity, specificity, PPV and NPV of PET/CT, MRI and combined MRI and PET/CT in T staging

Lymph nodes

DW-MRI showed 84 % sensitivity, 67 % specificity, 88 % PPV and 60 % NPV in identifying pathological lymph nodes, whereas STIR sequences showed 100 % sensitivity, 56 % specificity, 86 % PPV and 100 % NPV. The respective sensitivities, specificities, PPVs and NPVs of PET-CT vs. PET-MRI were 72 vs. 92 %, 89 vs. 89 %, 95 vs. 96 % and 53 vs. 89 % (Table 5). The area under the ROC curve (AUC) of PET/CT was 0.85, while the STIR sequence, DWI and PET-MRI AUCs were 0.81, 0.76 and 0.91, respectively (Figs. 1, 2), values that were statistically significant (p < 0.05). The optimal ADC threshold for differentiating between benign and malignant cervical lymph nodes, determined by ROC curve analysis, was found to be 1.03 × 10−3 mm2/s, roughly in line with literature values [4]. Cohen’s k test showed that the agreement between PET/CT and MRI was “moderate” (0.35).

Fig. 1
figure 1

Evaluation of lymph nodes via ROC curves for SUV and ADC values and size (in STIR images); AUC: PET 0.85, STIR 0.81, DWI 0.76

Fig. 2
figure 2

ROC curve for multimodal evaluation (PET-MRI); AUC: 0.91

Table 5 Sensitivity, specificity, PPV and NPV of PET/CT, MRI and combined MRI and PET/CT in N staging

Discussion

The added value of PET-MRI fusion imaging in the staging of head neck cancer has been demonstrated by Huang et al. [17], who compared the performance of PET-MRI software fusion with PET/CT, MRI and CT for the assessment of deep tissue invasion in 17 patients with advanced buccal SCC. DWI sequences were not, however, available and the results were correlated with pathology. Among the four modalities tested, the sensitivity and specificity of PET-MRI software fusion were the highest (90 %/91 % for PET-MRI, 80 %/84 % for PET/CT, 80 %/80 % for MRI and 55 %/82 % for CT, respectively). As the level of diagnostic confidence was also highest for PET-MRI software fusion, Huang et al. concluded that, in advanced buccal SCC, PET-MRI is more reliable than PET/CT, MRI or CT for the assessment of local invasion and for the delineation of tumour size.

Likewise, Kanda et al. [3] evaluated the clinical value of retrospective image fusion of MRI and FDG-PET from PET/CT in 30 patients with oral cavity and hypopharyngeal SCC, comparing the respective T and N staging performances of PET-MRI, PET/CT and MRI with histopathology. They found similar accuracy for fused PET-MRI and MRI in terms of T staging, with both being superior to PET/CT (87 and 90 % vs. 67 %, p < 0.04), whereas the sensitivity and specificity for the detection of nodal metastasis (N staging) on a level-per-level basis were 77 %/96 % for both PET/MRI and PET/CT, as compared to 49 %/99 % for MRI. These differences in sensitivity were significant (p < 0.0026).

As an alternative to fusion images, Yoon et al. [18] compared the diagnostic values of four different techniques, including PET/CT and MRI, and their combined use. They found that PET-MRI provided greater diagnostic accuracy (97 %) in the pre-operative assessment of nodal involvement than either MRI (95.3 %) or PET/CT (95.0 %) alone.

In our study, PET/CT showed a lower sensitivity than MRI (53 vs. 94 %) but the same specificity (75 %) in the T staging, and the low spatial resolution of PET/CT did not always make it possible to identify the infiltration of the peri-tumoral soft tissues. Indeed, in seven cases of nasopharyngeal tumour the nuclear medicine specialist failed to detect the infiltration of the soft tissue, although the radiologist did so correctly. In three of the seven cases, the tumours had infiltrated the long muscle of the neck, in two the vascular space, and the remaining two the masticator space, and their misinterpretation was attributed to the presence of artefacts from beam hardening due to metal implants and skull base bones (Fig. 3).

Fig. 3
figure 3

In PET/CT images, a the nuclear physician identified the lesion in the nasopharynx; note the beam hardening artifacts from metal implants and the high uptake of the structures of the skull base; b, the tumoral extension infiltrating surrounding soft tissues (especially the long muscle of the neck and tensor veli palatini muscle) and the encroachment into the neighbouring spaces (vascular space and masticator space) seen in STIR MRI sequences (image on the left) become much more evident with contrast enhancement on T1 fat sat (image on the right)

Conversely, in one case the alteration of the para-pharyngeal adipose tissue on the STIR sequence, characterized by a weak uptake in PET/CT, was initially interpreted as a possible tumour infiltration by both readers. However, later evaluation of the co-registered images allowed the alteration to be correctly re-interpreted as an inflammatory reaction. In another case, however, the extensive inflammatory reaction of a localized tumour in the oropharynx was incorrectly interpreted as positive by PET/CT, MRI, and co-registered images (Table 4). Nevertheless, overall, PET-MRI image assessment provided better diagnostic accuracy than either MRI or PET/CT in terms of T staging (Table 4).

In terms of N staging, as mentioned above, MRI is not always able to differentiate metastatic from reactive lymph nodes, mainly due to the size criterion. However, it is also important to emphasize the limitations of DWI sequences, in which normal lymph nodes often exhibit high signal intensity due to high cellular density. In our series, DWI showed to be more specific (67 vs 56 %) and lesser sensitive (84 vs 100 %) than STIR sequence, although these differences were not statistically significant. Therefore, DWI seemed not to have a significant impact on nodal detection compared to morphological sequences and it should be always interpreted in conjunction with a complete diagnostic MRI protocol.

PET/CT has proven to be highly accurate in identifying nodal metastases. The glucose analogue 18FDG is a marker of the metabolic activity of the tumour based since cancer cells have a higher carbohydrate metabolism than normal cells [24], meaning that 18FDG-PET is able to detect even small metastatic foci in morphologically normal lymph nodes [25]. Nevertheless, in our study, seven lymph nodes were falsely negative on PET/CT due to the following:

  1. (a)

    In four cases, a peri-tumoral lymphadenopathy was mistaken as tumour infiltration into the para-pharyngeal compartment (overstaging cancer), while STIR sequences correctly interpreted these alterations as nodal involvement (Fig. 4). In two of these four cases, PET-MRI image assessment incorrectly interpreted the peri-tumoral lymphadenopathies as tumour infiltration, since the adipose cleavage was not easily distinguished from the tumour.

    Fig. 4
    figure 4

    MRI sequences a allowed to identify the correct localization of a non-infiltrating nasopharyngeal tumour, distinguishing it from an adjacent pathological lymph node, whereas on PET/CT, b it was not possible to distinguish the tumour extension and lymph node involvement with any certainty

  2. (b)

    In three cases, necrotic lymph nodes were detected by MRI, but not by PET/CT due to a lack of FDG uptake. On the other hand, three lymph nodes with no FDG uptake on PET/CT were falsely considered positive by STIR sequences, because their shortest diameter was greater than 1 cm. As for the DWI sequences, four false-negative lymph nodes showed ADC values very close to the cutoff value (1.3 × 10−3 mm2/s) obtained from the ROC curve, prompting the radiologist to class them as non-pathological.

PET/MRI combined assessment of lymph nodes provided significantly higher sensitivity, specificity, PPV and NPV than those of the single techniques alone. Although we did not assess PET/MRI without DWI sequences in N staging, information provided by DWI seemed not to be remarkable. In our cases, the high specificity of PET/MRI assessment is due to the addition of PET/CT (exactly the same specificity, 89 %). On the other hand, information provided by DWI may be considered superfluous in terms of sensitivity and specificity. In fact, as discussed above, there were no significant differences between DWI and STIR sequence; therefore, the added role of DWI is questionable in this setting. These findings are in line with Queiroz et al. [26], who concluded that the use of DWI does not provide remarkable information in the evaluation of head and neck cancers. In this study, PET/MRI without DWI showed to be more accurate than PET/MRI with DWI in detecting malignant lesions (70 patients and 118 lesions taken into account, either primary tumours or lymph node metastases), although without statistical significance (86.8 vs 84 %, p > 0.05). Moreover, DWI added 20 lesions to the PET/MRI findings: 11 of those were false-positive inflammatory/reactive lymph nodes and 9 malignant ones. However, none of these 9 malignant lymph nodes changed the overall N staging in comparison to PET/MRI read without DWI sequences. For these reasons, the use of DWI might not be needed in clinical PET/MRI protocols for the staging of head and neck cancers. Similar results were found by Buchbender et al. [27] for total body staging in 25 patients affected by different kinds of tumours. They demonstrated that PET/MRI without DWI and PET/MRI with DWI detected the same number of malignant lesions (49, including primary tumours, distant metastases and pathologic lymph nodes). Moreover, the lesion-to-background contrast on DWI compared to 18FDG-PET (rated qualitatively from 0, invisible, to 3, high) was rated lower in 22 (44.9 %) of 49 detected lesions, resulting in a higher lesion conspicuity of 18FDG-PET compared to DWI (p = 0.001). The above findings confirm that DWI does not add significant information as part of whole-body PET/MRI examination in lesion detection; this is important given the necessity to optimize PET/MRI protocols, with regard to patient comfort and efficacy.

Conclusions

In staging head and neck SCCs, assessment of combined multimodal imaging through co-registered MRI sequences and PET/CT data enabled more accurate detection of radiotracer uptake overlapping area of suspected infiltration, increasing the diagnostic value of T and N parameters. As integration of the two methods increased the sensitivity, specificity and diagnostic accuracy with respect to the individual methods alone, combined multimodal imaging may, therefore, be a valuable technique for the detection and staging of tumours of head and neck.