Introduction

Lung cancer is the leading cause of cancer-related death and accounts for 14% of all new cases of cancer in the USA in 2011 [1]. Accurate staging is mandatory to select the most appropriate therapy and to determine prognosis. Combined 18F-fluorodeoxyglucose positron emission tomography and computed tomography (PET/CT) is considered as the standard of reference for preoperative assessment of non-small-cell lung cancer (NSCLC) [2, 3]. Despite its widespread use and high degree of standardisation, the results provided by PET/CT are still not totally satisfying. Limitations of PET/CT are particularly limited spatial resolution [4] and low specificity in distinguishing malignant lymphadenopathy from inflammatory changes, resulting in a considerable number of false-positive findings [5]. Moreover, PET/CT is associated with a considerable radiation burden to patients and medical personnel.

Magnetic resonance imaging (MRI) is currently the only technique that enables non-invasive whole-body assessment without ionising radiation. Another strength of MRI is its capability to create high soft tissue contrast without external contrast agents and with high spatial resolution. A novel powerful source of contrast generation for whole-body MRI became available in 2004 with the introduction of diffusion-weighted imaging with background signal suppression (DWIBS) [6]. The image contrast of DWIBS is based on the diffusion properties of water molecules and reflects tissue parameters like cellular density and tissue architecture [7]. In the last few years, DWIBS has been investigated successfully in many fields of oncology [8]. Also for NSCLC, an increasing number of studies on DWI became available [924]. Only a few studies have addressed validation DWI versus PET/CT as standard of reference. Among the data that have been published up till now, four studies focus on tumour detection and characterisation of lung nodules [912], four studies on N-staging and characterisation of mediastinal lymph nodes [1114], and two studies on M-staging [12, 15]. Only one study [11] provides data on overall staging accuracy according to the Union for International Cancer Control (UICC) classification. To the best of our knowledge, this is the first study presenting data on T-staging accuracy of MRI with diffusion weighting.

The objective of this study is to assess the diagnostic value of whole-body MRI with DWIBS in comparison to PET/CT for comprehensive preoperative assessment of NSCLC in a clinical setting. Data evaluation included primary tumour detection, T-staging, detection of individual lymph node metastases, N-staging and UICC staging with histopathology and cytology as the reference standards.

Materials and methods

Patients

Thirty-three patients (24 men, 9 women, mean age 63.7 years, median age 66 years, age range 43–83 years) with suspected NSCLC were prospectively enrolled. All patients underwent PET/CT and were scheduled for surgery according to the PET/CT findings. Whole-body MRI examinations were performed before surgery. In one patient, surgery was cancelled due to negative results in an additional transbronchial biopsy that was done after the MRI exam. The histological subtypes of pulmonary malignancies represented according to the WHO/IASLC classification [25] were adenocarcinoma (16 cases), squamous cell carcinoma (eight cases), adenosquamous carcinoma (one case), large cell carcinoma (two cases) and well differentiated neuroendocrine tumour (one case). Two patients were excluded from statistical evaluation as histology revealed malignant lesions of non-pulmonary origin (one lymphoma, one colon cancer metastasis). Three patients were diagnosed as negative for lung cancer by histology or cytology (one tuberculosis, one respiratory bronchiolitis-interstitial lung disease [RB-ILD], one inconclusive). The mean time interval between imaging and surgery was 34 (± 26) days for PET/CT and 3 (± 2) days for MRI. Three patients received surgical resection after neo-adjuvant radiation and/or chemotherapy. However, no therapy was performed between imaging and surgery. All procedures were in accordance with the ethical standards of the World Medical Association and written informed consent was obtained from all patients. The study was approved by the local ethics committee.

Imaging protocol

PET/CT examinations were performed on an integrated PET/CT system with 16-slice CT (Discovery; GE Healthcare, Chalfont St Giles, UK). 18F-FDG was administered in a standard dose of 5 MBq per kg body weight (maximum dose 500 MBq) 60 min before imaging after a fasting period of a minimum of 6 h. For PET acquisition, eight bed positions with each 4-min data acquisition were obtained from skull to upper thigh. All patients received unenhanced low-dose CT for attenuation correction and anatomical reference (tube voltage = 120 kV, tube current = 100 mA, collimation = 16 × 3.75 mm, free breathing). In patients who had not previously undergone a dedicated chest CT examination, additional contrast enhanced CT of the chest was performed (tube voltage = 120 kV, tube current = 300 mA, collimation = 16 × 1.25 mm, breath hold, 80 ml intravenous iodinated contrast medium).

MRI examinations were performed on a 1.5-T whole-body MRI (Magnetom Avanto or Magnetom Symphony, Siemens Healthcare, Erlangen, Germany) using a dedicated 18-channel coil array system (total imaging matrix [Tim], Siemens Healthcare). The sequences employed were T1-weighted turbo spin echo (TSE) (TR = 682 ms, TE = 11 ms, matrix size = 320 × 240 pixels, slice thickness = 5 mm, field of view (FoV) = 500 × 375 mm2, acquisition time 7 × 1:01 min), T2-weighted short tau inversion recovery (STIR) (TR = 9,930 ms, TE = 86 ms, TI = 160 ms, matrix size = 320 × 240 pixels, slice thickness = 5 mm, FoV = 500 × 375 mm2, acquisition time 7 × 1:19 min), and DWIBS (single shot echo planar imaging [ss-EPI], TR = 5,400 ms, TE = 58 ms, b = 0 and 800 s/mm2, STIR fat suppression with TI = 180 ms, matrix size = 192 × 144 pixels, slice thickness = 5 mm, FoV = 500 × 375 mm2, four averages, acquisition time 7 × 1:43 min) in transverse orientation, each covering the patients’ body in seven acquisition steps from skull to upper thigh. Total examination time was 30 min. All data were acquired during free breathing. No contrast agent was applied. Apparent diffusion coefficients (ADCs) were calculated pixel-wise (linear fit to logarithmical data) and displayed as ADC maps. Image fusion of T1-weighted and high b-value DWI data was performed three-dimensionally in a semiautomatic fashion and rendered in transverse and coronal image stacks.

Image analysis

Initial assessment of the PET/CT examinations was done as part of the routine work by a board-certified nuclear medicine physician with more than 5 years’ experience in PET/CT reading (Reader 1). Separate blinded study readings were performed >4 weeks later in three cases where data from a previous PET/CT examination had been used during the initial reading procedures. Second reading of all PET/CT examinations was done by a board-certified nuclear medicine physician with more than 10 years’ experience in PET/CT reading (Reader 2). Both low-dose and contrast enhanced CT datasets were used for the PET/CT readings. MRI examinations were read by a board-certified radiologist with more than 5 years’ experience in MRI reading (Reader 1) and in a second reading procedure by a board-certified radiologist with more than 10 years’ experience in MRI reading (Reader 2). All datasets (T1-weighted, T2-weighted STIR and DWIBS) were considered for diagnosis. All readers were blinded to the results of the other imaging technique and histopathology results. Previous images from investigations other than PET/CT and MRI were available to all readers. Image reading was done on commercially available workstations (Centricity RA 1000; GE Healthcare).

Image quality of MRI was assed on a four-point scale (very good = dataset with no visible artefacts; good = slight motion artefacts that do not affect diagnostic assessment; fair = artefacts that tolerably affect diagnostic assessment; unsatisfactory = non-diagnostic data). Evaluation was done both for the entire study and separately for the three individual sequences. Image quality of PET/CT was assessed overall using the same scale as for the MRI examinations.

Staging was done for each technique according to the 7th edition TNM and UICC classifications [26]. Both PET/CT and MRI images were interpreted in a qualitative manner considering both morphological and functional information. Increased FDG-uptake and restricted diffusion were identified by visual comparison of the lesion’s signal to the FDG-uptake of the liver parenchyma in PET and the background signal in high b-value DWI, respectively. Quantitative values for standardised uptake value (SUV) and ADC were calculated and used for interpretation of particular findings if found appropriate by the readers. However, no general cut-off values were applied for differentiating benign from malignant lesions. For lymph node assessment, a short axis diameter of > 1 cm was regarded as a morphological criterion for metastatic involvement.

Lymph node stations were divided into three groups for individual assessment (Sn = stations according to the IASLC lymph node map [27], i = ipsilateral, c = contralateral): (1) N1-nodes: ipsilateral intrapulmonary, peribronchial and hilar nodes (S10-14); (2) N2-nodes: subcarinal nodes (S7), ipsilateral mediastinal and para-aortic nodes (S2-6i, 8-9i); (3) N3-nodes: contralateral mediastinal and para-aortic nodes (S2-6c, 8-9c), and supraclavicular nodes (S1). A group of lymph node stations was rated positive, if at least one lymph node from one of the stations was considered to be metastatic.

Statistical analysis

Overall accuracy was calculated for primary tumour detection, T-staging, N-staging, group-wise assessment of lymph nodes and UICC staging. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated for primary tumour detection and group-wise assessment of lymph nodes. All measures were calculated along with their corresponding exact binomial 95% confidence intervals. A retrospective size analysis was performed based on the histology reports on those lymph node groups that were rated false negative by at least one of the four observers. Inter-observer agreement was assessed by calculating percentages of actually observed agreement without correction for effects of chance along with the corresponding exact binomial 95% confidence intervals. Statistical significance of the differences between the staging results obtained by PET/CT and MRI was tested using McNemar’s test. Image quality ratings were compared by Wilcoxon signed-rank test. A P value of less than 0.05 was considered statistically significant.

Results

Whole-body MRI with DWIBS as well as PET/CT provided diagnostic image quality in all cases. Overall image quality for PET/CT was rated very good in 14 cases (45%), and good in the remaining 17 cases (55%), which was significantly better (P = 0.01) than for MRI (very good, 7 [23%]; good, 20 [65%]; fair, 4 [13%]). Most important artefacts seen in MRI were ghosting artefacts from cardiovascular pulsations that affected morphological assessment of mediastinal structures. Assessment of individual MRI sequences revealed no significant differences (P > 0.1) between DWI (very good, 6 [19%]; good, 21 [68%]; fair, 4 [13%]), T1-weighted TSE (very good, 7 [23%]; good, 18 [58%]; fair, 6 [19%]) and T2-weighted STIR (very good, 8 [15%]; good, 15 [48%]; fair, 8 [15%]).

The results of MRI and PET/CT for the detection of primary tumour lesions are displayed in Table 1. The individual numbers represent average values from two readers. Table 2 displays the T-staging results of the two techniques. Overall T-staging accuracy was higher for MRI (63%) than for PET/CT (56%). This difference, however, was not statistically significant (P = 0.6).

Table 1 Detection of primary NSCLC by MRI and FDG-PET/CT (average values from two readers). Values in parentheses indicate 95% confidence intervals
Table 2 T-staging results of MRI and FDG-PET/CT (average values from two readers). Resulting T-staging accuracy is 63% (95% CI, 46–80%) for MRI and 56% (39–73%) for FDG-PET/CT

Table 3 displays the N-staging results for MRI and PET/CT. N-staging accuracy was 66% for MRI and 71% for PET/CT. The results of group-wise assessment of metastatic lymph node involvement by MRI and PET/CT are shown in Table 4. A mean number of nine lymph node groups in total were rated false negative by each technique. In three cases, single metastases with sizes between 4 mm and 7 mm (maximum diameters measured by the pathologist on the surgical specimen) were missed by all four observers. In two cases, metastases with sizes of 8 mm and 15 mm were missed by both PET/CT readers, but were detected by one of the MRI readers. In another two cases, metastases with sizes of 7 mm and 15 mm that were detected by one of the PET/CT readers were missed by both observers with MRI. Metastatic involvement of the remaining six lymph node groups was detected by at least one observer from each technique.

Table 3 N-staging results for MRI and FDG-PET/CT (average values from two readers). Resulting N-staging accuracy is 66% (95% CI: 49–83%) for MRI and 71% (55–87%) for FDG-PET/CT
Table 4 Group-wise assessment of metastatic lymph node involvement by MRI and FDG-PET/CT (average values from two readers). Values in parentheses indicate 95% confidence intervals

UICC staging results are displayed in Table 5. Comparison of methods by McNemar’s test revealed no statistically significant difference between MRI and PET/CT for any of the calculated measures. The P value obtained for comparison of accuracy were 0.7 for primary tumour detection, 0.6 for T-staging, 0.65 for N-staging, 0.42 for assessment of individual lymph node groups and 0.4 for UICC staging. Figures 1 and 2 display two sample cases with discrepant diagnoses for T- and N-stage between the two modalities. A schematic overview on tumour detection and staging accuracies for both techniques is given in Fig. 3.

Table 5 UICC staging results for MRI and FDG-PET/CT (average values from two readers). Resulting UICC staging accuracy is 66% (95% CI, 49–83%) for MRI and 74% (59–89%) for FDG-PET/CT
Fig. 1
figure 1

Sample case of a 50-year-old man with primary tuberculosis. Upper row: FDG-PET source images in inverted grey scale (a, c), and combined PET/CT images after fusion of PET data on CT images in lung (b) and soft tissue window (d). Lower row: High b-value diffusion-weighted MRI source images in inverted grey scale (e, g), and combined images from high b-value diffusion-weighted and T1-weighted MRI data (f, h). This case was rated false positive for malignancy (T1a N1, UICC IIA) by both PET/CT readers due to high FDG-uptake in both the primary lesion (a, b) and in a right hilar lymph node (c, d). On MRI, one of the readers considered the primary lesion to be malignant (T1) due to a small area of restricted diffusion within the primary lesion (arrows in e, f), whereas the second reader correctly identified the lesion as benign. Both MRI readers correctly described no suspicious lymphadenopathy (g, h)

Fig. 2
figure 2

Sample case of a 55-year-old man with pT2b pN1, UICC IIB squamous cell carcinoma. Upper row: FDG-PET source image (a), original CT images in soft tissue (c) and lung window (d), and combined PET/CT image (b). Lower row: High b-value diffusion-weighted source image (e), combined image from high b-value DWI and T1-weighted MRI (f), T2-weighted STIR image (g), and ADC-map (h). This case was rated false positive for N2-disease by one of the MRI readers due to increased signal intensity in a right paratracheal lymph node (arrows) on both T2-weighted STIR and high b-value DWI. Increased ADCmean (1.9 × 10-3 mm2/s measured with a 2D ROI in h) was interpreted by this reader as necrotic changes. PET/CT showed no increased FDG-uptake in this location (a/b). T-stage, however, was overestimated in the same patient by both PET/CT readers (T4 and T3) due to inflammatory changes in the surrounding lung parenchyma, but correctly assessed by both MRI readers

Fig. 3
figure 3

Schematic overview of tumour detection and staging accuracies for MRI and PET/CT. The error bars represent 95% confidence intervals

Observer agreement rates were 52% (34–70%) for T-staging, 68% (52–84%) for N-staging and 74% (59–89%) for UICC staging with MRI compared with 65% (48–82%), 68% (52–84%) and 90% (79–100%) with PET/CT. Inter-observer agreement generally tended to be higher for PET/CT than for MRI. However, differences were not statistically significant (P = 0.22, 0.87 and 0.09).

Discussion

The objective of this study was to assess the diagnostic value of whole-body MRI with DWIBS for comprehensive preoperative assessment of NSCLC in comparison with PET/CT with histopathology and cytology as the reference standards.

With respect to primary tumour detection, MRI and PET/CT both provided excellent results with sensitivity, accuracy and PPV of more than 89% for both techniques. The values obtained for sensitivity and accuracy are similar to previously reported results [912] which are 70–100% sensitivity and 72–100% accuracy for MRI, and 76–100% sensitivity and 74–100% accuracy for PET/CT, respectively. The values for specificity and NPV derived from the current study should be interpreted with caution because of the small sample size with only three non-malignant lesions. Specificity values for primary tumour detection reported in the literature are 96–97% for MRI and 79–82% for PET/CT [912].

Accuracy of T-staging is found to be slightly higher for MRI (63%) compared with 56% for PET/CT. However, this difference is far from being statistically significant (P = 0.6). To the best of our knowledge, there are no published data on T-staging accuracy of MRI with diffusion weighting in NSCLC patients. The observed accuracy of 56% for PET/CT is in good accordance with the results from a recently published retrospective study [28]. In general, published values for T-staging accuracy of PET/CT cover a wide range from 88% [3] down to only 39% in a study focused on stage IIIA disease [29]. In our study, most T-staging errors occurred with both techniques by mixing stages T1 and T2, which is of limited clinical relevance.

N-staging is a key issue in preoperative work-up of NSCLC patients. Until submission of this work, four studies had compared the value of diffusion-weighted MRI and PET/CT for lymph node assessment in NSCLC patients [1114]. The published results from these studies were 67–91% sensitivity, 87–99% specificity and 80–98% accuracy for MRI, compared with 48–98% sensitivity, 89–97% specificity and 80–97% accuracy for PET/CT. The specificity and accuracy values observed in our study are in very good accordance with these results. However, sensitivity values calculated from our data are lower than the reported values for both techniques. It is well known that FDG-PET is considerably limited in detecting small lymph node metastases. A study that evaluated size dependence of PET/CT in N-staging of NSCLC patients reported a sensitivity of only 32% in lymph nodes < 10 mm compared with 85% in lymph nodes > 10 mm leading to a moderate sensitivity of 54% overall [30]. Another recently published study on PET/CT in early stage NSCLC also revealed low sensitivity for lymph node involvement of 44% [31]. In our study, approximately half of the false-negative lymph nodes were metastases smaller than 10 mm. In our opinion, this explains sufficiently the sensitivity results provided by PET/CT.

The question of spatial resolution limitations in NSCLC lymph node staging with diffusion-weighted MRI is discussed controversially in literature. The authors of one recently published study [11] claim that MRI may be superior to PET/CT in detecting small lymph node metastases, thus resulting in higher overall sensitivity and accuracy for MRI. Others [12, 13] see no relevant advantage of MRI in this respect, but describe higher accuracy and PPV of MRI compared with PET/CT owing to fewer false-positive findings. Sensitivity and PPV for metastatic lymph node involvement cannot be assessed with sufficient accuracy by our study because of the small sample size. Differences in accuracy, however, that are seen in our data between diffusion-weighted MRI and PET/CT regarding assessment of lymph node groups and N-staging are far from being statistically significant (P = 0.42 and 0.65). Thus, even considering the fact that the sample size included by Usuda et al. [11] is twice as high as in our study, we are not able to reproduce the very positive results for MRI given by this previous study.

Accuracy of UICC staging in NSCLC by means of diffusion-weighted MRI has to our knowledge only been investigated by one earlier study [11]. The reported values of 71% for DWI and 65% for PET/CT are in good accordance with the results from our study (66% for DWI and 74% for PET/CT). The slight difference that is observed in favour of PET/CT is not reflected by the individual T- and N-staging results and is statistically not significant. The level of confidence (P = 0.4) is almost equal to the number given by Usuda et al. (P = 0.44) [11].

The method of diffusion-weighted whole-body MRI that has been evaluated in this study is from a technical perspective comparable to the methods that have been used by previous authors. We chose a very simple approach for our protocol including two b values, free-breathing acquisition, STIR–fat suppression and a moderate number of averages to facilitate efficient whole-body coverage. Potential technical improvements as applied by other authors would be the use of more or higher b values, more elaborate methods of fat saturation such as spectral pre-saturation inversion recovery (SPAIR), and a higher number of averages. With reference to former studies, as cited above, we would expect most of these modifications to have only minor effects on overall staging accuracy. Artefacts from breathing motion and cardiovascular pulsations, however, were significantly more important in all MRI sequences applied in our study compared with PET/CT and represent a major issue in this context. The use of breath-holding, respiratory gating and ECG gating techniques may improve delineation of anatomical structures particularly in the central part of the chest and thus may further increase sensitivity for small central pulmonary lesions and mediastinal lymph nodes.

This study has several limitations. First to mention is the small patient number that particularly compromises the calculation accuracy of specificity and NPV for primary tumour detection and sensitivity and PPV for detection of lymph node metastases. Second, no valuable data on N3 lymph nodes could be obtained, as only patients with a potentially resectable stage of disease were included in the study. For the same reason, M-stage could not be assessed reliably as only two patients with solitary brain or lung metastases and curative intention for surgery were included. Thus, also the clinically important question of differentiating between UICC stages IA–IIIA and stages IIIB–IV has not been addressed. Another limitation of the study design is the delay between PET/CT and surgery being systematically longer than between MRI and surgery. This might be interpreted as a bias in favour of MRI. However, we also observed two cases where the interpretation of MRI was impaired considerably by post-stenotic pneumonia or atelectasis that was not present at the time of the PET/CT examination. A systematic disadvantage for MRI that has to be discussed is reading expertise: radiologists and nuclear medicine physicians who acted as readers in this study had comparable expertise in their fields. However, compared with PET/CT, reading of DWIBS images is not yet standardised well and individual experience of readers with this new technique is still limited even for otherwise experienced radiologists. This is expressed by lower observer agreement for MRI compared with PET/CT in our study. Finally, the study outcome for both imaging techniques is limited by the fact that all staging results are based on the impression of a single board-certified radiologist or nuclear medicine physician. However, this reflects a quite common situation in clinical practise. It can reasonably be assumed that additional consensus readings in cases of discrepancy would likely have improved staging accuracies for both techniques.

In conclusion, this study has shown, in agreement with previously published studies, that whole-body MRI with DWIBS is a powerful method for staging of NSCLC and provides results comparable to the reference standard PET/CT. Thus, whole-body MRI with DWIBS may qualify as first line technique for staging of NSCLC when PET/CT is not available. However, as opposed to other authors, we are not yet convinced that there is clear evidence of a superiority of DWI with respect to lymph node assessment. We agree that the method of diffusion-weighted MRI has two intrinsic technical advantages over FDG-PET, which are spatial resolution and soft tissue contrast. However, this potential has not been exploited to its full extent by today’s routinely available applications. There is certainly a need for further technical improvement of both diffusion-weighted and conventional MRI sequences for optimised morphological and functional assessment of pulmonary and mediastinal structures.