Introduction

High-grade gliomas (HGG) are the most common and aggressive of primary brain tumors. MRI is the primary modality used widely for diagnosis and assessment of treatment response in HGG [1].

Diffusion weighted imaging (DWI) is one of the most mature advanced MRI techniques used in conjunction with conventional anatomic MRI to probe brain tumor tissue physiology and function. Apparent diffusion coefficient (ADC) maps generated from DWI reflect microscopic, sub-voxel level barriers to the random thermal motion of water molecules in each image voxel [2, 3]. Because the principle barriers to water diffusion in brain and tumor tissue are intracellular and cell membranes, tissue ADC measured with clinical DWI—typically employing a diffusion sensitization gradient “b-value” of 1000 s/mm2—is largely determined by the extracellular volume fraction, which itself represents a convolution of tissue cellularity, tissue edema and tissue necrosis. The use of ADC as an indicator for HGG response is appealing because DWI is rapid, relatively motion insensitive, technically robust and does not require use of a contrast agent. Multiple studies have demonstrated that low pre-treatment minimum tumor ADC (ADCmin) correlates with high cellularity and that ADCmin correlates with survival in HGG patients [4, 5]. The published optimal ADCmin thresholds for distinguishing highly cellular, high-grade tumor from lower cellularity and histological grade tumor range from 0.6 to 1 × 10−3 mm2/s [4,5,6,7]. Mean ADC values (ADCmean) have also been shown to correlate with overall survival (OS) at a cutoff value of 1.04 × 10−3 mm2/s [8]. Furthermore, ADCmean has also been used to differentiate recurrent tumors (1.18 ± 0.13 × 10−3 mm2/s) from pseudoprogression and radiation-induced necrosis (1.40 ± 0.17 × 10−3 mm2/s) [9].

Normalization of ADC was first reported in 2002 in order to reduce inter-scanner and inter-scan variation in ADC in the context of correlating ADC with histological measures of cellularity [10]. A number of researchers have more recently employed ADC normalization in the attempt to address the inconsistency in thresholds reported for ADCmin and ADCmean. Normalized ADC (nADC) values are calculated as the ratio of the ADC in the tumor to the ADC in normal-appearing contralateral white matter (NAWM), allowing the patient to serve as his or her own control [11, 12]. One study (n = 52) reported that nADC has a stronger correlation with both progression-free survival (PFS) and OS than absolute ADC [13]. Another study (n = 18) demonstrated that normalization improves differentiation of responders from non-responders in high-grade pediatric brain tumors [14]. Several other studies have reported significant correlation of nADC with survival but have not compared nADC directly to absolute ADC [11, 15].

Whether ADC normalization is necessary remains uncertain. In large part this depends on the relative reproducibility of absolute and nADC measurements. The reported cutoff values for nADCmin (0.3–0.75) [13, 15] and the more reproducible nADCmean (1.5–1.62) [9, 11] may seem small in absolute terms, but this must be compared to the known very high reproducibility of ADC in volunteers and stroke patients. A multicenter (seven institutions), multi-vendor (GE, Philips, Siemens and Toshiba) and multi-field strength (1.5 and 3 T) investigation of gray and white matter ADC in stroke patients revealed an intra-vendor variation of 3–5% and inter-vendor variation of up to 15% in absolute ADC despite use of nearly identical scanning protocols and parameters [16]. The authors suggested that nADC may be essential to reduce this variation. However, of the four major vendors in their study one produced ADC estimates reproducibly lower than the other 3. When this was eliminated the inter-vendor variation dropped to roughly 7%. A more recent multicenter, DWI reproducibility study of healthy volunteers revealed inter- and intra-scanner coefficients of variance (CV) < 7% across eight scanners, two vendors, two field strengths (1.5 and 3 T) and five centers [17]. Although the authors did not attempt to harmonize the DWI acquisition parameters other than b values (0 and 1000), both intra- and inter-scanner reproducibility were comparable, with only marginally inferior inter-scanner reproducibility.

The formulae for the ADC calculation itself incorporates signal intensities at 2 or more b-values, which introduces a substantial degree of intrinsic normalization. When combined with known very high reproducibility of absolute ADC measurements across patients and scanners from most major vendors this raises some question about the benefit to be expected from an additional normalization step. Given the minimal available published data comparing nADC to absolute ADC in brain tumor we set out to test whether nADC is superior to absolute ADC in a clinically representative group of HGG patients who had undergone a number of different treatments and had known survival. Our null hypothesis was that normalization of ADC in a clinical HGG population would not significantly improve the previously published correlation of ADC with the clinical gold standard of patient OS.

Materials and methods

Patients

A total of 37 HGG patients were included in this study, among which 25 patients had glioblastoma multiforme (GBM) and 12 had anaplastic astrocytoma (AA). The detailed patient selection criteria and treatment histories of this dataset have been reported previously in a paper reporting a strong correlation between pre-treatment non-enhancing tumor (NT) ADCmean < 1.04 × 10−3 mm2/s with a shorter OS (< 15 months) compared to patients with a higher ADCmean [8]. These pre-treatment DWI were acquired 3 days to 1 month after tumor resection surgery, and 3–7 days prior to chemo-radiation therapy. Because of the known strong correlation of absolute ADCmean with OS in this cohort, we analyzed normalized and absolute pre-treatment ADCmean.

Image acquisition and analysis

Data were acquired with one of several 3 T MR scanners in a single institution (SiemensHealthcare, Trio, Germany, and GE Medical Systems, Waukesha, WI), using three different product sequence DWI acquisition protocols, with detailed scan parameters reported previously [8]. Although scan parameters were not identical, the b-values were 0 and 1000 s/mm2 for all acquisitions, and the correlation with OS of measured ADCmean did not differ significantly between the protocols [8].

Three neuroradiologists independently selected the NT region in each case, defined as brain tissue located within 2 cm from the enhancing tumor or resection cavity on one representative slice. Regions of interest (ROIs) were manually outlined, and the mean ADC inside the ROIs was used for analysis [8]. The NAWM ROI was selected independently as well in the contralateral NAWM, or when no normal appearing contralateral white matter could be found, in an ipsilateral part of the white matter that appeared completely normal on MRI and that was some distance away from any visible abnormality. This control ROI was selected so as to avoid large vessels, CSF and gray matter (Fig. 1). nADCmean was calculated as the ratio of the absolute ADCmean inside NT ROI to that in NAWM ROI.

Fig. 1
figure 1

Representative ADC map (a) and post contrast T1 weighted (b) MRI. NAWM ROI is selected in the anterior right frontal lobe (hollow arrow) and NT ROI in the WM around the left fronto-parietal junction enhancing tumor (solid arrow)

Statistical analysis

Mean and standard deviation (SD) of absolute and nADCmean values were calculated for the three radiologists. Paired t-tests were used to compare the differences between readers. A mixed effects model taking readers as fixed effect and subjects (i.e. patients) as random effect was built to assess the repeatability across readers [18]. With-in subject coefficient of variation (wCV) was calculated to measure relative repeatability of ADC measurement within each patient scan and intra-class correlation coefficient (ICC) was used to assess the inter-observer agreement between readers across the cohort [18, 19]. A receiver operating characteristics (ROC) curve was calculated to correlate ADCmean and nADCmean with the 15-month OS. The survival cutoff of 15 months was selected based on literature reporting 12–15 months median post-operative OS in GBM patients [20]. ROC analysis was also performed on subgroups of patients with GBM and AA respectively. Absolute ADC and nADC correlation with OS were compared within and between these subgroups. All statistical tests were performed using Stata Version 13 (StataCorp, Texas, USA). A p ≤ 0.05 was considered to be statistically significant.

Results

Tumor ADCmean was measured to be 1.083 ± 0.259, 1.089 ± 0.249, and 1.290 ± 0.315 (mean ± SD) by the three readers respectively (Table 1). Readers 1 and 2 produced similar measurements, but Reader 3 produced significantly different measurements (Table 1; Fig. 2). NAWM ADCmean was similar across the readers (Table 1). As a result, the nADCmean of Reader 3 is significantly different from the other two readers and normalization did not improve variation among readers. wCV among the three readers were similar for both absolute and nADCmean, at 15.9 and 16.4% respectively. The wCV of NAWM was much smaller at 4.9%. The ICC across readers was 0.61 [0.60–0.62] for all three metrics.

Table 1 With-in subject coefficient of variation (wCV) and ICC among three independent readers for absolute ADCmean (× 10−3 mm2/s), NAWM ADCmean (× 10−3 mm2/s) and nADCmean
Fig. 2
figure 2

a The ADCmean measured in tumor and NAWM by the three independent readers. b nADCmean of the same readers

Figure 3 illustrates the ADCmean and nADCmean of patients categorized by OS less or greater than15-months. T-tests demonstrate a significant difference between the survival groups for both the ADCmean and nADCmean (p = 0.0014 and 0.0073 respectively) suggesting either can be used as a good biomarker for survival. In contrast to previous reports however, nADC did not detectably decrease the overlap in ADC distributions between the OS > 15 m and OS < 15 m groups [9].

Fig. 3
figure 3

Box and whisker plots of the ADCmean and nADCmean of patients subdivided by 15-month OS. t-test shows that both ADCmean and nADCmean differed significantly between the two survival groups. Left is absolute ADCmean; t-test between the two groups shows a p = 0.0014. Right is nADCmean; t-test has a p = 0.0073. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to cover ± 2.7σ of data, and the crosses “+” denote the outliers

Table 2 provides the summary statistics for the correlation between ADCmean and nADCmean with 15-month OS. ROC analysis shows that the nADCmean correlation with OS was not significantly better than absolute ADCmean correlation with OS. ADC estimates from Readers 1 and 2 correlate with OS significantly, but Reader 3 estimates did not. Normalization of ADCmean did not improve the ROC AUC (p = 0.19). Subgroup ROC analysis revealed no significant difference between absolute ADCmean and nADCmean for either GBM (AUC = 0.89 for both ADCmean and nADCmean) or AA (AUC = 0.79 for ADCmean and AUC = 0.75 for nADCmean) patients. Although the AUC for the GBM subgroup was slightly higher than for the AA subgroup, the difference was not significant (p = 0.53 for absolute ADC; p = 0.36 for nADC).

Table 2 ROC curve correlating ADC with 15-month OS

Discussion

DWI estimates of ADC in HGG offers a valuable marker of tumor. Image voxels containing more cellular tumor have a smaller volume fraction of extracellular extravascular water and thus lower ADC. Conversely, voxels containing less cellular tumor have a higher extracellular volume fraction and thus higher ADC. ADC has been shown to correlate with cellularity, treatments response and prognosis in patients with HGG. A low ADCmin predicts a less favorable prognosis compared with a high ADCmin [4, 5]. Murakami et al. found the ADCmin to be a useful clinical prognostic biomarker for survival in 79 patients with malignant supratentorial astrocytoma showing those with a low ADCmin (< 1.0 × 10−3 mm2/s) having a poor prognosis [4]. Yamasaki et al. also found ADCmin to be a statistically significant correlate for OS in 33 GBM patients at a cutoff value of 1.0 × 10−3 mm2/s [5]. Nakamura et al. revealed a pre-treatment cutoff value for ADCmin at 0.93 × 10−3 mm2/s to be able to predict survival in postoperative GBM patients [6]. While Saksena et al. demonstrated that patients with a pre-treatment ADCmin > 0.6 × 10−3 mm2/s had a higher rate of progression free survival at 6 months compared to those with lower ADCmin, this finding was not statistically significant [7].

With varying ADC cutoff measures reported to correlate with survival, some investigators have proposed using nADC values instead of absolute values. Elson et al. evaluated postoperative MRIs of 52 GBM patients and found nADCmin measured within the T2/FLAIR volume to be the strongest predictor of PFS and OS when compared to absolute ADCmean, ADCmin and nADCmean. A lower nADCmin (< 0.3) was associated with poor survival [13]. Oh et al. revealed, in 28 postoperative GBM patients, a statistically significant shorter median survival time with an nADCmean < 1.5 compared to those with an nADCmean > 1.5 within the T2 region (11.2 and 21.7 months, respectively, p = 0.004) [11]. The nADCmin reported by Elson et al. is much smaller, but the nADCmean of 1.7 is similar to the 1.5 cutoff value reported by Oh et al. Elson et al. showed that patients with an nADCmean < 1.7 have an inferior PFS and OS compared to those with nADCmean > 1.7 (4.4 vs. 7.9 months, and 11.7 vs. 15.5 months, respectively), however, this was not statistically significant. Hein et al. demonstrated that both ADCmean and nADCmean within enhancing lesions showed statistically significant lower values in the tumor recurrence or progression group compared to the non-recurrence group in a total of 18 patients with HGG. However, while nADCmean showed no overlapping between the two groups at a cutoff value of 1.62 there was some overlap in the ADCmean between the two groups [9]. Shankar analyzed 84 GBM patients’ pre-treatment MRI and found that a cutoff nADCmin of 0.75 significantly correlated with OS [15], while absolute ADCmin or ADCmean did not significantly correlate with survival.

Our findings are at odds with several recent publications suggesting that normalization of HGG tumor ADC improves correlation with survival and reduces variation in tumor ADC estimates. In our cohort, normalization of ADC did not improve correlation with OS compared to absolute ADC (Table 2). Nor did normalization reduce the variation among our three independent readers (Fig. 2). The wCV for tumor ADCmean and nADCmean estimates within each subject were almost the same at 15.9 and 16.4% respectively. Interestingly, from Table 1 and Fig. 2, one can clearly see that NAWM ADC measurements were much more consistent (wCV as low as 4.9%) across the readers compared to tumor ADC even though all ROIs were defined independently. The observation that wCV of ADC measured in tumor is much higher than ADC measured in NAWM likely can be attributed in large part to the heterogeneous nature of HGG. This heterogeneity means that any slight difference in location of the ROI selection would result in a larger difference in ADC measurements than in NAWM. This probably explains why one of the readers measured significant different tumor ADCmean compared to the other two readers. However, the mean of NAWM measured by the three readers are very consistent, 0.759, 0.754 and 0.758 × 10−3 mm2/s respectively. This is in good accordance with the literature: Grech-Sollars et al. measured a mean ADC of 0.7 × 10−3 mm2/s for white matter in volunteers in a multicenter study [17] and Sasaki et al. measured similar WM mean ADC in their multivendor, multi-institutional study [16].

Normalization to NAWM is well suited to address variation between scans and between patients that affects the entire pulse sequence or image; in other words, variation that affects normal white matter voxels in the same way and to the same degree as tumor voxels. These include stochastic variation between measurements, normal anatomic variation, differences in coil coupling efficiency, pre-amplification, pulse sequence software, scanner hardware among others. However, the high reproducibility of NAWM ADC measurements, and more than threefold greater variation in tumor ADC measurements implies that sources of variation that affect NAWM and tumor voxels equally account for less than one-third of the overall variation observed in tumor ADC. As such it is not surprising that normalization of ADC did not detectably improve reproducibility of ADC measurements between readers or within subjects in our cohort.

The subgroup analysis supported the finding of equivalence between absolute and nADC demonstrated in the overall cohort: no significantly different was detected between absolute ADC and nADC correlation with OS when either AA or GBM patients were analyzed independently. The correlation of ADC and nADC with OS was very strong in each group. The AUC was slightly larger in the subgroup of GBM patients analyzed alone (0.89 ADC; 0.89 nADC) compared with the overall GBM and AA cohort (0.82 ADC; 0.79 nADC) and the AA patients analyzed alone (0.79 ADC; 0.75 nADC), although these differences did not reach statistical significance. This trend may reflect that the longer OS of AA patients compared to GBM patients weakened the statistical correlation in the AA subgroup, or could be related to differences in cellularity, vascularity, treatment or other factors between AA and GBM subgroups.

One limitation of our study is limited sample size. It is possible that a larger sample size would have revealed an effect of normalization we did not detect, but such an effect would not likely be large. We did not attempt to control for molecular markers, adjuvant or experimental therapies or any of the other myriad factors that influence OS. Instead we deliberately included a population of patients with genetically and molecularly different tumors and different treatment histories. Our goal was to assess whether normalization improves the known correlation of ADC with survival, not to investigate the absolute correlation of ADC with survival. Within individual patients and the test population as a whole, differences in molecular tumor subtypes and therapies should not bias the comparison between ADC and nADC because the same ADC and OS data was used regardless of whether the ADC was normalized. While analysis of a homogeneous population might be expected to produce a higher absolute correlation between ADC and OS, normalization would be expected to provide greater benefit in a heterogeneous population. Also the heterogeneous population better mimics the heterogeneity encountered in real clinical populations and makes the result more readily generalizable to different clinical research populations. A patient population with greater homogeneity of tumor grade and treatment history might have allowed detection of a small effect that was not seen in our heterogeneous clinical sample, but such a study would be less directly applicable to clinical practice. Similarly, an even more meticulously standardized analysis protocol could be devised to decrease the variation in ROI selection between readers that seems likely to account for the majority of the variation we saw. Such a protocol could be used to demonstrate a statistically significant benefit to ADC normalization that we did not detect. This may explain the discordance between our findings and other recent reports. Nevertheless, achieving such high reproducibility in clinical practice is notoriously difficult. As such, we believe that in clinical practice and clinical research, normalization is not necessary for valid use of ADC as marker of highly cellular tumor and is not likely to improve correlation with survival.

Conclusion

Normalization of ADC does not significantly improve correlation with OS in patients with HGG when compared to absolute ADC. This finding suggests that normalization is not necessary for clinical or research ADC analysis in HGG patients.