Introduction

Pretreatment characterization of the tumour aggressiveness of prostate cancer (PCa) is vital for selecting the optimal therapy, and thus helpful to avoid overtreating and to improve active surveillance for the patients with and without clinically significant cancers; however, exactly how the patients should be risk stratified is still not clear [13]. Many nomograms and methods of stratifying PCa by risk classification have been reported for discriminating low-aggressiveness lesions from clinically significant PCa [46]. Of all the clinically determinable parameters, the Gleason scoring system has proven most important for measuring aggressiveness, disease outcome and the risk of mortality from PCa [3]. However, the estimation of Gleason score (GS) is based on invasive biopsy or radical prostatectomy (RP), which can be associated with serious adverse effects [7]. Therefore, development of a reliable non-invasive method that could differentiate low- from high-aggressiveness PCa would be a major advance and would have a significant benefit for individualized treatment options.

Recently, diffusion-weighted (DW) MR imaging combined with various mathematical models has been gaining substantial interest as a possible tool to detect localized PCa and predict its biochemical aggressiveness [814]. Documented studies have shown statistically significant correlations between the apparent diffusion coefficients (ADCs) obtained from DW imaging and the GS of PCa [1518]. In addition, a few studies demonstrated that ADCs may predict prognosis for patients undergoing active surveillance and biochemical recurrence after RP [19, 20]. Although it is argued in most studies that the ADCs between different Gleason grade closely correlate with tumour cellularity, it is important to realize that monoexponential ADC measurement is partly influenced by tissue perfusion [21, 22]. The contribution of perfusion to the diffusion signal was elucidated by Le Bihan et al. in their pioneering work on intravoxel incoherent motion (IVIM) [23, 24]. The blood flow in a randomly oriented microvasculature, referred to as pseudo-diffusion, contributes to diffusion signal decay predominantly at low b values (<200 s/mm2). Perfusion effects can be resolved from the true tissue diffusion by acquiring DWI with a sufficiently wide range of b values, followed by biexponential curve fitting. Such an analysis can resolve pseudo-diffusivity D* and tissue diffusivity D separately, along with their respective volume fractions f [2527]. In recent years, several studies examined prostate DW-IVIM in the comparison of cancerous regions and normal tissue [22, 28, 29], but the confounding role of DW-IVIM for grading of tumour aggressiveness is rarely reported. In addition, the conventionally used method for quantitation of DW imaging data is limited by the performance with a region of interest (ROI)-based measurement. Such a method (e.g. placing regional ROIs on a representative section of the tumour) has been pointed out as a limitation of many studies [30, 31], in which the overlap of a single measurement may lead to interobserver variability in ROI placement. A histogram analysis approach has been shown to be a premising tool in discriminating tumour grade, differentiating their subtypes, and assessing therapeutic effects on the cancer [3234].

The purpose of our study was thus to primarily assess the diagnostic performance of IVIM for stratifying the aggressiveness of PCa on the basis of an entire-tumour histogram analysis, by using post-RP pathological results as reference standard.

Materials and methods

Patients

This retrospective study was approved by our institutional review board and written informed consent was waived. Between December 2012 and March 2014, 63 consecutive patients with clinically localized prostate cancer (proved by biopsy examination) who were referred for routine clinical evaluation underwent standard MR examination (including DW imaging) before RP. The inclusion criteria were (a) patient had undergone no prior hormonal or radiation treatment, (b) DW imaging performed with the same parameters, (c) the diagnosis of “primary” PCa and (d) performance of detailed histopathology where at least one tumour focus with a diameter of 0.5 cm or more available for multiparametric DW-MR calculation. A total of 48 patients met all inclusion criteria and were included in this study. The other patients were excluded because sequence parameters of DW imaging were not unified (n = 7), the tumour was too small (n = 5) or neoadjuvant therapy was administered before the MR examination (n = 3).

MR acquisition

MR examinations were performed with a 3.0-T MR system (Verio Tim; Siemens, Erlangen, Germany) and a pelvic phased-array coil. As per the standard clinical prostate MR examination at our institution, the images obtained included transverse T1-weighted turbo spin-echo (TSE) images (repetition time (ms)/echo time (ms), 700/14; section thickness, 3.5 mm; intersection gap, 0.5 mm; field of view, 25 cm; and matrix, 384 × 336) and transverse, coronal and sagittal T2-weighted TSE images (repetition time (ms)/effective echo time (ms) 6,000/124; section thickness, 3.5 mm; intersection gap, 0.3 mm; field of view, 25 cm; and matrix, 384 × 336) of the prostate and seminal vesicles. Then, single-shot echo-planar imaging (repetition time (ms)/echo time (ms), 6,000/72; field of view, 25 cm; matrix, 192 × 130; section thickness, 3.5 mm; intersection gap, 0.3 mm; a parallel imaging factor of 2; and 21 sections) was performed with diffusion-module and fat suppression pulses. Diffusion in three directions was measured by using b values of 0, 50, 150, 300, 600 and 900 s/mm2.

Histological–radiological correlation

After RP, the prostatic specimens were uniformly processed and submitted for histological investigation. The prostatectomy specimens were fixed in 10 % neutral buffered formalin and stored overnight after surgical resection. Prostatectomy specimens were fixed in 5 % buffered formalin, processed and cut serially into 4-mm-thick blocks from apex to base in transverse planes. Each block was then halved or quartered (depending on its size), and 7–8-μm-thick microtome slices were stained with haematoxylin and eosin. A genitourinary pathologist with more than 10 years’ experience in genitourinary pathology reviewed all the sections and outlined the location of the tumour on the photographs for each slice. Tumours were graded according to the 2005 International Society of Urological Pathology Modified Gleason Grading System.

We referred to the studies by Peng et al. [35] and Oto et al. [36] for the histological–radiological correlation. During this procedure, the pathologist identified all distinct tumour foci larger than 0.5 cm in diameter, and a radiologist (Y.Z., 8 years of experience in urinogenital imaging) manually outlined the corresponding regions of interest (ROIs) of the tumour foci on axial T2-weighted MR images. For those cancer foci that were not clearly visible on MR images, their locations were determined on the basis of their relationship with other identifiable landmarks (e.g. urethra, ejaculatory ducts, benign prostatic hyperplasia nodules) on MR images by consensus of the radiologist and pathologist. The drawn ROIs on MR images were required to carefully match the extent of the tumour determined from histological examination slice by slice. Gleason score was reported specifically for each tumour ROI by the study pathologist during the consensus review.

MR post-processing

All data were transferred in Digital Imaging and Communications in Medicine (DICOM) format and processed off-line using MATLAB software (MathWorks Inc. Natick, MA, USA). The DW data were post-processed with the monoexponential and biexponential IVIM model, respectively. For the biexponential IVIM model, the relationship between signal intensity of DWI and b factors can be expressed by Eq. 1:

$$ \frac{S(b)}{S_0}=\mathrm{f}\kern1em {\mathrm{e}}^{-D*}+\left(1-f\right){e}^{-D} $$
(1)

where S(b) is the mean signal intensity, S 0 is the signal intensity without diffusion, f is the fraction of perfusion, influenced by directional flow of water molecules during diffusion time. D is the diffusion parameter representing pure molecular diffusion (the slow component of diffusion), and D* is the diffusion parameter representing incoherent microcirculation within the voxel, i.e. perfusion-related diffusion or the fast component of diffusion. Data were fitted with the Levenberg–Marquardt nonlinear least squares algorithm. Pixel-by-pixel maps of diffusion features, D, D*, f and ADC were then automatically constructed from the proposed models.

In terms of ROI measurement, the obtained ROIs on T2-weighted MR images were transferred to all other multiparametric DW-IVIM imaging maps with computer software developed in-house, which allowed one the manual adjustment of the potential misalignment. Lastly, the multiparametric measurement of ROIs in each slice of the tumour foci was summated to derive voxel-by-voxel values for a histogram analysis. The histogram of DW-IVIM parameters for each ROI was analysed by using commercially available software (PASW Statistics 16.0; SPSS Inc., Chicago, IL, USA) for histogram analysis. D, D* and ADC histograms were plotted with diffusivity on the x-axis with a bin size of 1 × 10−6 mm2/s, and the f histogram was plotted with perfusion fraction on the x-axis with a bin size of 1 × 10−3. The y-axis expressed the percentage of tumour volume by dividing the frequency in each bin by the total number of voxels analysed. For further quantitative analysis, cumulative D, D*, f and ADC values were obtained from their histograms, respectively, in which the cumulative number of observations in all of the bins up to the specified bin was mapped onto the y-axis as a percentage. Based on an entire-tumour measurement, the following histogram parameters were derived from ADC, D, D* and f maps, respectively: (a) mean; (b) median; (c) kurtosis, which is the degree of peakedness of a distribution and (d) skewness, which is a measure of the degree of asymmetry of a distribution. For the cumulative histograms, the 10th and 75th percentiles of the tumour D, D*, f and ADCs were derived, respectively (the nth percentile is the point at which n % of the voxel values that form the histogram are found to the left). A representative case for the introduction of metrics for histological–radiological correlation and histogram analysis of DW imaging measures are shown in Fig. 1.

Fig. 1
figure 1

A representative PCa with Gleason score of 4 + 4 for the introduction of metrics for histologic–radiologic correlation, and histogram analysis of DW imaging measures. A solid mass was shown in the left peripheral zone from a histologic specimen (a), and characteristic appearance with inhomogeneous hypointense signal intensity (SI) on axial T2WI (b) and hyperintense SI on DWI (c). The tumour boundary was identified by histologic–radiologic correlation and was then outlined on T2WI and DWI (white line). The pixel-by-pixel D (d), f (e) and D* (f) were obtained, respectively, and the corresponding histogram distributions were constructed

Statistical analysis

All statistical analyses were performed by using SPSS 16.0. P values of less than 0.05 determined statistical significance. We firstly calculated the Spearman rank-order correlation coefficient (ñ) to characterize the strength of correlation between multiparametric imaging measures and ordinal GS. And as discriminating the low-grade (LG) from combined intermediate- and high-grade (HG) tumours using the quantitative DW imaging measures is of clinical importance, we classified the data into LG group (GS ≤6) and HG group (GS > 6), respectively. The normality and homoscedasticity of the multiparametric imaging data were tested using the Q–Q plots and Levene’s tests. Data satisfying the assumption (mean, median, the 10th and 75th percentiles) were subjected to independent sample t test. Conversely, data not satisfying the assumption (kurtosis, skewness) were analysed by using the Mann–Whitney U test. The overall ability to discriminate between LG and HG tumour was analysed by using a receiver operating characteristic (ROC) regression model, and quantified by using the areas under the ROC curves (Az), referring to the method of DeLong et al. [37]. The diagnostic accuracy, sensitivity and specificity were calculated at a cutoff point that maximized the value of the Youden index. In the case of the 15 out of 48 patients who had more than one tumour focus, making the analysis not independent of each other, only the tumour focus with the highest GS was considered for ROC analysis.

Results

Of the total of 48 histologically confirmed cases, 30 had only peripheral zone (PZ) PCa. In the remaining 18 patients, five had only transitional zone (TZ) PCa and 13 had both PZ and TZ PCa. The median serum PSA level for the cohort was 19.2 ng/mL (range, 0.7–214.4 ng/mL; median, 19.2 ng/mL). Of these 48 patients, 33 patients had one tumour focus, five patients had two tumour foci (one had a diameter of tumour foci less than 0.5 cm), two patients had three tumour foci, eight patients had more than four tumour foci (six patients had only one tumour foci with a diameter larger than 0.5 cm, two patients had three tumour foci with a diameter larger than 0.5 cm). After exclusion of the tumour foci with a diameter less than 0.5 cm, a total of 60 tumour foci were included for the end imaging measures. The median volume of the included tumour foci was 2.3 ml (0.5–61.5 ml). Clinical characteristics of patients, as well as those of tumour foci ROIs, are summarized in Table 1.

Table 1 Summary of clinical and pathologic characteristics

Spearman’s rank correlation showed that all the histogram indices for D reflected statistically significant correlations with ordinal GS (p < 0.01). And for histogram ADCs, only mean, median and 10th and 75th percentiles reflected statistically significant correlations with ordinal GS (p < 0.01). All the histogram indices for D* and f did not reflect statistically significant correlations with ordinal GS (p > 0.05). It was noted that the indices for D exhibited relatively higher correlation coefficients with tumour GS than those of ADCs (Table 2).

Table 2 Spearman’s rank correlation of tumour Gleason grade to DW-IVIM histograms

The independent sample t test showed that the HG tumour had significantly lower ADCs than LG tumour in terms of histogram mean, median and the 10th and 75th percentiles (all p < 0.001). For IVIM-derived D, the histogram mean, median and the 10th and 75th percentiles reflected statistically significant differences between LG and HG groups (all p < 0.001). For D* and f, the mean, median and 10th and 75th percentiles for D* and f did not reflect statistically significant differences between the two qualitative groups (all p > 0.05). The Mann–Whitney U test showed that, for histogram ADCs, D and f, the HG tumours had both higher kurtosis and skewness than LG tumours (all p < 0.010). The histogram kurtosis and skewness for D* did not reflect statistically significant differences between the two qualitative groups (Table 3). Two representative cases with LG and HG PCa for the comparison of multiparametric DW imaging measures are shown in Figs. 2 and 3.

Table 3 Cumulative histogram parameters of DW-IVIM according to Gleason score
Fig. 2
figure 2

DW-IVIM imaging of a tumour with Gleason score (GS) of 3 + 3 (a) and a tumour foci with GS of 4 + 4 (b) from two different patients. It shows a heterogeneous low SI on T2-weighted images and high SI on corresponding DW images (b = 600 s/mm2) in both lesions. The D maps show the remarkably decreased diffusivity both in low- and high-GS tumour foci, but the high-GS tumour shows a relatively low D compared to low-GS tumour. The tumour in both cases show a slightly decreased perfusion fraction compared to normal tissue. The D* maps do not show evident difference between normal and cancerous regions both in low- and high-grade tumour foci

Fig. 3
figure 3

Comparison of entire-tumour D histograms for two cases with GS 3 + 4 (a, b) and GS 5 + 4 (c, d). The tumour with GS 3 + 4 shows a lower relative frequency at low D compared with GS 5 + 4 tumour foci, resulting in substantial divergence between low- and high-grade PCa at the low end of the cumulative histograms. This suggests that the high-grade PCa contained more pixels with low D, which indicates high cellularity

Table 4 demonstrates the results of the ROC analyses of the multiparametric imaging measures for distinguishing patients with LG tumour from patients with HG tumour. The difference of Az was not significant between each histogram index for ADCs and f (all p > 0.05). For IVIM-derived D, the difference of Az was significant between kurtosis vs. mean (p = 0.017), vs. median (p = 0.016) and vs. 10th (p = 0.031), respectively. And the difference of Az was significant between skewness vs. mean (p = 0.017), vs. median (p = 0.016) and vs. 10th percentile (p = 0.009), respectively. In terms of the comparison between D and ADCs, D had significantly higher Az values than ADCs in histogram mean (p = 0.018), median (p = 0.044) and 10th percentile (p = 0.023), respectively. And these histogram indices derived from D exhibited relatively higher accuracy, sensitivity and specificity for discrimination between LG and HG tumours than did ADCs (Fig. 4).

Table 4 Effectiveness of quantitative DW imaging indices in differentiating low-grade from intermediate/high-grade prostate cancer
Fig. 4
figure 4

Comparison of diagnostic ability for discriminating LG from HG tumour between histogram ADCs and D. The ROC analysis shows that D has significantly higher Az values than ADCs in histogram mean (a), median (b) and 10th percentile (c), indicating a more reliable with IVIM method for stratifying tumour Gleason grade. *Difference is significant at the 0.05 level

Discussion

Our study demonstrates that stratification of the Gleason grade of PCa is feasible by multiparametric DW-IVIM MR imaging with histogram analysis metrics. The patients with PCa were determined as those with a dominant decrease in tumour diffusivity (reflected by ADCs or D), corresponding to the increased Gleason grade. And interestingly, the HG tumour had higher kurtosis and skewness for histogram ADCs, D and f than LG tumour, indicating more heterogeneity and complexity of cellularity in a tumour with a high Gleason score. Moreover, the histogram mean, median and 10th percentile D exhibited better individual features for discriminating patients with LG tumour from patients with HG tumour than did ADCs, indicating that the proposed IVIM method is a more precise way of stratifying biotic features of PCa. However, the difference of histogram D* and f between the two qualitative grade groups did not reach statistical significance, which indicates that pseudo-perfusion may contribute little to the diffusivity for predicting the tumour grade of PCa.

DW-MRI has been accepted as an important imaging biomarker of PCa and provides reliable indices for grading of various tumour entities [16, 18, 38]. In recent years, several studies examined IVIM in patients with PCa, but inconsistent results were presented (e.g. for f, both higher and lower values compared to normal tissue were reported) [22, 39, 40]; and moreover, the behaviour of IVIM parameters in these studies was only compared between normal tissue and biopsy findings, and the confounding role of DW-IVIM for grading of tumour aggressiveness is rarely reported. In the present study, we employed the IVIM analysis to evaluate the aggressiveness of tumour foci in two qualitative grade groups, and compared these results to conventional ADCs. It demonstrated that parameter D is excellent for discriminating tumour Gleason grade, and has better individual features (higher Az values) for discriminating low-grade tumour foci from intermediate/high-grade tumour foci. The predominant decrease in D within HG tumour is commonly understood to be a consequence of increased tumour cellularity, which results in a more decreased diffusivity of water molecules [4143]. In addition, within the prostate, the contribution of DW imaging signal is also from extracellular components, e.g. from tubular structures, luminal space and their fluid content [44, 45]. Therefore, the biexponential fitting model derived from IVIM is theoretically more favourable for describing the tumour biotical characteristics. A statistical difference of f has been previously indicated between cancerous zones and healthy individuals by Quentin and Pang et al. [21, 22]. However, there was no significant difference of both f and D* in two Gleason-grade groups in our study, which indicated that the extracellular components, such as fluid content or vascular perfusion, made little contribution to DW imaging signal in the cancerous zone, and were thus little related to its aggressiveness. This was supported by a similar study [36], which demonstrated that quantitative DCE-MRI parameters did not show a significant correlation with Gleason score or VEGF expression, and limited studies evaluating the correlation between DCE-MRI parameters and angiogenesis markers of PCa have also provided contradicting results [16, 46, 47].

It is well known that increased heterogeneity and complexity of tumour cellularity, one of the important histological features in malignant prostate cancer tissues, is highly responsible for the local aggressiveness and pathological grade of the tumour [48, 49]. Within the PCa, tissue heterogeneity could produce a wide spectrum of cellular density in a localized tumour site. This means that ADCs within the whole tumour volume may vary widely between different regions of the tumour. Unfortunately, the commonly used method for DW imaging measurement (e.g. placement of regional ROIs on a representative section of the tumour) has been pointed out as a limitation of many studies, in which the overlap of single ADC measurements between different grades did not accurately reflect the true features of tumour cellular density. We explored a histogram analysis for focusing more on the distribution of DW-IVIM parameters, rather than simple summary statistics. This histogram analysis approach was proved to be a premising tool in discriminating tumour grade or monitoring the effects of chemotherapy in brain and ovarian cancer as it is able to objectively reflect different microenvironments of diffusivity through the entire tumour volume [32, 34, 50, 51]. Interestingly, we illustrated its potential in discriminating tumour Gleason grade. The kurtosis and skewness, respectively, reflect the histogram distribution with regards to peakedness and normality; and the increase in kurtosis and skewness in HG PCa may represent more complexity of intratumorous cellularity that are characteristic of various Gleason-scored nuclei, necrosis, haemorrhage or fibrosis. The low-percentile histogram D may represent the microstructural information of tumour foci with the highest cellular density. Visually, the diffusivity changes translated into a shift of the histograms toward the left and adoption of a more asymmetrical shape in tumour with HG PCa, indicating increasing nuclei density and higher risk of aggressiveness.

By using ROC curves, we found that our histogram D was able to distinguish tumours with intermediate to high Gleason grade from those with low Gleason grade with an accuracy of 72.9–91.7 %, sensitivity of 70.3–91.8 % and specificity of 63.6–90.9 %, which were relatively higher than those obtained with our histogram ADCs and conventional ADCs obtained by Rosenkrantz et al. [52]. The better diagnostic performance of the proposed IVIM method for discriminating tumour grade in our study was probably due to (a) an improvement in imaging measures, which was based on an entire-tumour histogram analysis and carefully matching histological-radiologic findings; and (b) the usage of biexponential curve fitting metrics, which was reported to show better individual features in terms of model fit and repeatability [11]. In addition, the histogram mean, median and 10th percentile D produced higher Az values than kurtosis and skewness, indicating a better diagnostic effectiveness of these indices for stratification of tumour grade. However the inter-group difference of Az between mean, median, the 10th and 75th percentiles of D did not reflect statistical significance, indicating their similar diagnostic ability; moreover, the difference of histogram D* and f between two qualitative groups did not reach statistical significance, demonstrating that pseudo-perfusion may contribute little to the diffusivity for predicting the tumour grade of PCa.

Our study also has several limitations. First, similar to the study of Peng et al. [35], we performed the histological–radiologic correlation through a systematic consensus-seeking correlative review of histological findings and MR images by a genitourinary pathologist and a radiologist. Although this was carried out carefully to reduce the potential mismatch, uncertainty still cannot be ruled out in some cases. Second, it is a retrospective study with a relatively small sample size that could have been influenced by selection biases; the number of tumours with GS ≤6 was relatively small. This may produce statistical uncertainty with regards to relatively high true negative cases (high specificity) in the study. Therefore, larger prospectively studied patient populations will be needed to refine the correlation of IVIM parameters and tumour aggressiveness.

Conclusion

Our results suggest that diffusivity D derived from IVIM can be a useful tool for discriminating low-grade tumour foci from intermediate/high-grade tumour foci in patients with PCa; and the f and D* contribute little to the diffusivity for predicting the tumour grade. The histogram analysis is helpful to reflect the varieties of biologic behaviour in tumour foci.