Prostate cancer (PC) is the most common malignancy other than skin cancer and the second most lethal cancer in American men, with more than 3.3 million men currently living with PC in the United States [1]. The 5-year survival rate ranges from 29.3% in patients with metastatic PC to 100% in patients with localized disease [1, 2]. In addition, trends in PC diagnosis and mortality rates in the last 30 years have shown that a significant increase in early diagnosis is associated with a significant decrease in mortality rates [2, 3]. Nonetheless, the rate of over diagnosis of non-clinically significant prostate cancer (nsPC) has been shown to be as much as 67%, leading to unnecessary treatment and associated morbidity [2]. The key to optimal PC management is the detection of clinically significant cancers that require prompt treatment [3]. There are several definitions of “clinically significant prostate cancer” (csPC), such as biopsy Gleason score (GS) ≥ 3 + 4; GS ≥ 4 + 3; and/or pathological volume >0.5 cc [4,5,6]. The definition of csPC is important when we seek to determine the role and accuracy of multiparametric magnetic resonance imaging (mpMRI) in its detection.

Apparent diffusion coefficient (ADC), derived from Diffusion-Weighted Imaging (DWI), plays an important role in mpMRI, as ADC values in men suspected of having PC have shown significant inverse correlation with GS when compared with biopsy and final pathological results [7,8,9]. ADC is lower in neoplastic prostatic tissue than non-neoplastic tissue, and it can differentiate low from high-grade PC [9, 10]. Importantly, the use of high b-values between 1000 and 1500 s/mm2 in DWI is more sensitive to tumor detection, as conspicuity of PC is improved, especially in those adjacent to or invading the anterior fibromuscular stroma [11,12,13]. Prostate mpMRI has emerged as an excellent method to detect and localize prostate cancer and thus guide treatment selection [14, 15]. There have been many studies demonstrating the major role of DWI in detection of Gleason pattern 4 disease, which have led to an interest in investigating the role of MRI in triaging patients before prostate biopsy. Two recent studies, the multi-center prospective PROMIS study and the bi-parametric IMPROD trial have determined that MRI can reduce the biopsy rate by 27% and 24%, respectively [15, 16].

As both qualitative and quantitative imaging metrics are influenced by the underlying mpMRI acquisition parameters, a major effort to standardize mpMRI acquisition, interpretation and reporting led to formation of Prostate Imaging – Reporting and Data System (PI-RADS) in 2012 by the European society of Urogenital Radiology [17, 18]. Subsequently, a broader International group developed PI-RADS version 2 (v2), which utilizes a 5 point scale to determine the likelihood of csPC. PI-RADS v2 defines csPC on pathology/histology as having GS ≥ 7 (including 3 + 4 with prominent but not predominant Gleason 4 component), and/or pathology volume ≥ 0.5 cc, and/or extra prostatic extension [16]. PI-RADS v2 is based upon qualitative interpretation of DWI/ADC, dynamic contrast-enhanced MRI (DCE-MRI) and T2-weighted imaging (T2WI). However, for peripheral zone (PZ) lesions, where 70% to 80% of PC is known to occur, DWI and ADC signal intensity are the dominant sequence used by PI-RADS v2 to determine likelihood of csPC [16,17,18,19]. In addition, the recommendation to use qualitative high b-value (>1400 s/mm2) sequences has been suggested by the PI-RADS v2 committee [13].

Importantly, it recently has been shown that pathological tumor volume affects PI-RADS v2 performance, which was shown to identify only 24% of tumors GS ≥ 4 + 3 with a pathological tumor volume of ≤ 0.5 cc [20]. This underscores the importance of including tumor volume in the assessment of diagnostic performance of qualitative and quantitative imaging metrics.

The purpose of our study was twofold: first, to compare the performance of qualitative assessment with PI-RADS v2 versus quantitative volumetric ADC assessment in detection of csPC in the PZ; and second, to determine the effect of multiple definitions of csPC on the diagnostic performance of ADC and PI-RADS v2.

Materials and methods

Patients

This retrospective study was HIPAA compliant and approved by the local Institutional Review Board. With the help of our institutional search engine, we identified patients who underwent endorectal coil mpMRI at 3T between January 2011- January 2015, were treatment naïve, and had pathologically proven PC within 6 months of mpMRI. Prior to the radiologist reviewing the cases, the mpMRIs underwent pre-screening to exclude cases that did not have high b-value (at least 0, 1400 s/mm2) DWI imaging.

MRI technique

All MRI examinations were performed on a GE Signa HDx 3.0 T magnet (GE Healthcare, Waukesha, WI) using an eight-channel abdominal array and endorectal coil (Medrad, Pittsburgh, PA), as previously described [21]. T1-weighted images were generated from spoiled gradient echo (SPGR) sequences with repetition time (TR)/echo time (TE)/α = 385/6.2 ms/65°, over a 16-cm2 field of view (FOV), 384 × 192 resolution and 3 mm slice thickness, T2-weighted images from fast relaxation fast spin echo (FRFSE) sequence, TR/TE = 3500 ms/102 ms over a 16-cm2 FOV, with 384 × 224 resolution and 3 mm slice thickness, and DWI images from single-shot echo planar imaging (EPI) sequences, TR/TE = 2500 ms/65 ms with b-values of 0, 500 and 1400 s/mm2 were acquired with 128 × 96 resolution and 3 mm slice thickness. DCE-MRI was performed with a 3D SPGR sequence with full-gland coverage. Gadopentetate dimeglumine (Magnevist, Berlex Laboratories, Wayne, NJ) was injected intravenously (0.15 mmol/kg; rate 3 mL/s).

Clinical and histopathologic information collection

Age and serum prostate specific antigen (PSA) at the time of PC diagnosis were collected using our electronic medical records database. We retrospectively reviewed the histopathology reports of the identified patients and recorded the primary and secondary GS. The pathology specimens were either from prostatectomy, MRI guided biopsy (MRGBx), or transrectal ultrasound guided prostate biopsy (TRUS-Bx). All prostate specimens were processed according to the Stanford protocol [22].

Image analysis

A single radiologist (F.M.F.), with over 10 years of experience interpreting prostate MRI, blinded to GS, reviewed the de-identified mpMRIs of each patient to identify cases with PZ index lesions. Tumor (T) and normal (N) PZ tissue volumes of interest were then outlined on ADC (b 0, 1400 s/mm2) maps using 3D Slicer, an open source image processing software (www.slicer.org) which has shown high reproducibility for mpMRI-based quantitative measurements [23, 24]. MRI index lesion volume was determined from ADC maps, and ADCmean for T (ADCT) and N (ADCN) were all calculated on 3D Slicer. ADCratio was defined as ADCT/ADCN. In the same sitting, an overall PI-RADS v2 assessment category (based upon T2, DCE, and DWI/ADC) was assigned to T [13]. PI-RADS v2 overall assessment category ≥ 4 were defined as “high” risk for csPC.

Definition of clinically significant PC

PZ Tumors were categorized as csPC based on either histopathology or tumor volume, using the following three definitions: Method A: index tumors with GS ≥ 4 + 3; Method B: index tumors with GS ≥ 3 + 4; Method C: MRI-based tumor volume ≥ 0.5 cc, calculated volumetrically on multiple slices of ADC maps, based upon the premise that pathologic-based tumor volume ≥ 0.5 cc is csPC [4,5,6].

Statistical analysis

Continuous variables were summarized using means and ranges, and categorical variables were summarized using frequencies and percentages. Each ADC metric (ADCT, ADCN, and ADCratio) was compared between csPC and nsPC groups for each method (A, B, C) using a two-sample T test. Tumor ADC metrics (ADCT and ADCratio) were correlated with GS using the Pearson test. Fisher’s Exact test was used to evaluate for associations between high PI-RADS v2 (≥ 4) and csPC, as defined by Methods A, B, and C.

Receiver operator characteristic (ROC) curves were plotted for PI-RADS v2, ADCT, and ADCratio to evaluate their performance in detecting csPC, and performance statistics were calculated (AUC, sensitivity, specificity, PPV, NPV). The AUC of all ROC plots were compared using a nonparametric approach, previously described by DeLong et al. [25]. This nonparametric approach was also used to compare the diagnostic performances (ROC for differentiating csPC from nsPC) of each ADC metric and overall PI-RADS v2 assessment category among the three definitions of csPC.

A two-tailed p value less than 0.05 was considered significant. All statistical analyses were performed using STATA (Version 11.2 StataCorp College Station, Texas USA).

Results

Patients and tumor characteristics

A total of 645 patients were identified. Using our institutional medical record search engine, we excluded those who did not have prostate pathology confirmation within a 6-month window of the mpMRI (n = 305), did not have an index PZ tumor on pathology (n = 87), or did not undergo high b-value (1400 s/mm2) DWI sequences (n = 168) to arrive at a final population of 85 patients (Fig. 1). Their mean was age 61.2 years [range 43–76 years] and mean PSA 6.52 ng/ml [range 0.01–55.88 ng/ml].

Fig. 1
figure 1

Flowchart of patient enrollment, exclusion criteria, and final study population. PCa, prostate cancer; mpMRI, multiparametric magnetic resonance imaging; DWI, diffusion-weighted imaging; ecoil, endorectal coil

The pathologic diagnosis of PC was based on prostatectomy (n = 54), TRUS-Bx (n = 26), and MRGBx (n = 5) (Table 1). Those who underwent TRUS-Bx all had a repeat TRUS-Bx, the results of which were in agreement with the first TRUS-Bx pathology. The median GS was 7 [range 6–9]. The distribution of GS was as follows: GS = 6 (n = 18); GS = 7 (47); GS = 8 (n = 9); and GS = 9 (n = 11). In patients with GS 7, primary Gleason grade was 3 (3 + 4) in 33 patients and 4 (4 + 3) in 14 patients. Based on Method A, 34/85 patients had csPC; based on Method B, 67/85 had csPC, and based on Method C, 54/85 patients had csPC. Distribution of lesions according to csPC definitions are reported in Table 2.

Table 1 Break down of Gleason scores based on reference pathology
Table 2 Distribution of lesions according to clinically significant prostate cancer definitions

ADC in differentiating csPC from nsPC using three different methodologies for defining csPC

ADC T was significantly lower in those with csPC compared to nsPC, regardless of the definition of csPC (p < 0.001, p < 0.001, p = 0.035 for Methods A, B, and C, respectively) (Table 3). ADCN was also significantly lower in those with csPC compared to nsPC (p < 0.03, p < 0.001; p = 0.006 for Methods A, B, and C, respectively) (Table 3). ADCratio was significantly lower in those with csPC compared to nsPC when csPC was defined with Method A and B (p < 0.001 and p = 0.01, respectively), but not with Method C. There was a negative correlation between ADCT and GS (ρ: −0.51, p < 0.001), and between ADCratio and GS (ρ: −0.39, p < 0.001).

Table 3 Comparison of ADC parameters between significant and non significant peripheral zone prostate cancer using three different definitions of clinical significance (A: GS ≥ 4 + 3, B: GS ≥ 3 + 4, C: volume >0.5 cc)

PI-RADS v2 in differentiating csPC from nsPC using three different methodologies for defining csPC

There was a significant association between high PI-RADS v2 assessment category and a diagnosis of csPC using Method A (p = 0.04) and Method B (p < 0.01). There was no association between high PI-RADS v2 and a csPC diagnosis when csPC was defined according to Method C (p = 0.11).

Comparison of diagnostic performance of ADC and PI-RADS v2 in detecting csPC

Using the three methods of defining csPC, the ROC curves for ADC metrics and PI-RADS v2 assessment category used to differentiate csPC from nsPC were plotted. Performance characteristics (AUC, sensitivity, specificity, NPV, PPV) of each metric (ADCT, ADCratio, and PI-RADS v2) are presented in Table 4.

Table 4 Performance of ADC and PI-RADS v2 parameters in differentiating clinically significant from non-clinically significant prostate cancer

Method A (GS ≥ 4 + 3): The AUC of ADCT was 0.83 for a cutoff of 1010 mm2/s; AUC of ADCratio was 0.77 for a cutoff of 0.68; the AUC of PI-RADS v2 was 0.65. The AUC of ADCT (0.83) was higher than PI-RADS v2 (0.65) (p = 0.006). There was no difference in AUC between ADCT and ADCratio, or between ADCratio and PI-RADS v2.

Method B (GS ≥ 3 + 4): The AUC of ADCT was 0.86 for a cutoff of 1069.13 mm2/s; the AUC of ADCratio was 0.70 for a cutoff of 0.68; the AUC of PI-RADS v2 was 0.65. The AUC of ADCT (0.86) was higher than that of ADCratio (0.70) (p < 0.001) and PI-RADS v2 (0.70) (p = 0.04). There was no difference in AUC between ADCratio and PI-RADS v2.

Method C (MRI-based tumor volume >0.5 cc): The AUC of ADCT was 0.66 for a cutoff of 1069.13 mm2/s; the AUC of ADCratio was 0.70 for a cutoff of 0.68; the AUC of PI-RADS v2 was 0.65. The AUC of PI-RADS v2 (0.72) was higher than AUC of ADCratio (0.56) (p = 0.02). AUC of ADCT (0.66) was also higher than AUC of ADCratio (p = 0.04).

Effect of csPC definition on the diagnostic performance of ADC and PI-RADS v2

Performance of ADCT was better when csPC was defined pathologically (Methods A and B), compared with volumetrically (Method C) (A vs C, p = 0.038; B vs C, p = 0.011). ADCratio performed better when csPC was defined according to Method A, compared with Method C (p = 0.01), with no difference observed between performance of ADCratio for method A compared to B, or B compared to C. Performance of PI-RADS v2 did not differ when csPC was defined by methods A, B, or C.

Discussion

This study was motivated by the introduction of PI-RADS v2, whose aim is to standardize acquisition, interpretation, and reporting of mpMRI of the prostate in order to improve the accuracy of mpMRI in the diagnosis of csPC. In this study, we demonstrated that index tumor ADCT outperforms PI-RADS v2 in detection of csPC in the PZ when we utilize a pathological GS definition of clinical significance, irrespective of whether this is defined as GS ≥ 3 + 4 or GS ≥ 4 + 3. Considering that ADC metrics have been shown to correlate with tumor GS, this finding is not unexpected.

However, if we consider csPC to be an index tumor with an MRI-based volume of ≥ 0.5 cc (Method C), ADCT did not perform better than PI-RADS v2 in detecting csPC. As a recent retrospective analysis showed, tumors measuring <5 mm diameter on pathology are much less likely to be of higher Gleason grade [26]. It is therefore understandable that smaller contoured tumors may have a higher ADC score due to volume averaging with adjacent normal tissue, leading to a lower PI-RADS v2 score. Conversely, larger lesions are more likely to be given a higher PI-RADS score [16, 27]. Vargas et al. have also evaluated PI-RADS v2 accuracy in detecting csPC defined as tumor volume >0.5 cc or GS ≥ 4 + 3 [20]. While they found PI-RADS v2 helpful in detecting lesions >0.5 cc, they also noted its limitation in assessing high GS lesions which were <0.5 cc.

It is important to note that our definition of csPC being ≥ 0.5 cc tumor volume in this study was based on volumetric MR image contouring of tumor on ADC, and not on pathologically-derived tumor volume from the prostatectomy specimen. However, MRI is known to significantly underestimate lesion size compared to whole mount prostatectomy analysis, particularly so for ADC [28, 29]. It is therefore possible that volumetric contouring of tumor volume based upon ADC underestimates the actual number of tumors that would be considered csPC if it was based on pathological tumor volume definition. As such, if we underestimate the number of csPC cases based on MRI-calculated volume, it may be expected that performance of ADC metrics and PI-RADS v2 are not as good for the volumetric definition of csPC (Method C), compared to those based upon GS (method A and B), which is what was we found.

In our study, PI-RADS v2 correctly classified 81.2% of GS ≥ 3 + 4 but only 48.24% of GS ≥ 4 + 3 (Table 4). The high sensitivity and very low specificity of PI-RADS-v2 across the three methods in Table 3 suggests that this tool is good to identify csPC in a selected population of patients with known PCa, as a negative test will be useful to rule out csPC, but it should not be used as a confirmatory test. This contrasts with a recent study where PI-RADS-v2 was found to have a sensitivity and specificity of over 80% for detection csPC [30]. The difference in performance characteristics may be explained by their use of a PI-RADS-v2 score of 3 (“the presence of clinically significant cancer is equivocal”) as a cutoff point for detection of csPC, whereas we used a score of 4 (“clinically significant cancer is likely to be present”) as a cutoff.

Our results comparing ADC metrics between tumor and normal tissue are similar to those previously reported, with significantly lower mean ADCT than those in healthy-appearing PZ prostate tissue (ADCN) [11, 31, 32]. Various ADC metrics have been investigated, including ADCmean and ADCratio, for detection of csPC. ADCmean values can either be calculated on a single slice that contains the largest volume of tumor, or on multiple slices to include whole tumor 3D volume [11, 31, 33]. ADCratio is defined as the ratio of ADC values of tumor to normal tissue, which aims to minimize the impact of study-specific imaging protocols [11, 34]. We also found a significant negative correlation between ADC parameters and GS, which is in agreement with previous studies that have evaluated ADC in assessment of aggressive PC [12, 32, 35]. Interestingly, we also found mean ADCN in the PZ to be significantly lower in the normal tissue of those with csPC group compared to those with nsPC, irrespective of the definition of csPC. To our knowledge, this finding that the background of “normal” ADC values in the prostate (i.e., in areas where no visible tumor by mpMRI) are significantly different in those patients with pathologically confirmed aggressive PC elsewhere in the prostate compared to those with less aggressive tumors, has not been previously reported. We speculate that these findings may be due to field cancerization, a known phenomenon in the histopathology literature, and currently being evaluated in the prostate [36].

In a recent study, Boesen et al. reported a better performance of ADCratio compared to ADCT in detecting high-grade cancer (defined as GS ≥ 7(4 + 3)) [35]. Our results suggest that ADCratio and ADCT perform equally well using a similar classification method (Method A). Another study by Woo et al. demonstrated that ADCT and ADCratio performed equally well in predicting high-grade PC, defined as (GS > 6 (3 + 3)) [8]. Using a similar csPC classification in our study (Method B), we found ADCT performed better than ADCratio. It is possible that these differences in performance of ADC metrics amongst studies may be a reflection of different methods of ADC parameter calculations (single slice, versus volumetric), and also the method of calculation and observed ADCN scores in csPC, which is taken into consideration for calculation of ADCratio. Also, it should also be noted that we used different imaging techniques and analyses than employed by Boesen and Woo [8, 35]. For ADC map calculation, we used b-values of 0, 500, and 1400 s/mm2 while Boesen et al. used 100 and 800 s/mm2 [35]. The non-monoexponential nature of signal decays with b-value in prostate tissues complicates comparisons between studies using different b-value ranges [37]. However, our use of higher b-values may have had an impact on tumor conspicuity as supported by a recent study that demonstrated AUC for differentiating high risk PC from low risk PC increases with an increase in b-value, reaching maximum AUC at b = 1600 s/mm [38]. Another possible explanation for the differences is that we performed a volumetric analysis of ADC values, as the lesion contouring was performed on multiple ADC slices to cover the entire tumor, enabling us to evaluate heterogeneous tumors with improved accuracy. In the aforementioned studies, the authors outlined the tumor only on the single ADC slice where the largest area of the tumor was visible.

Our study has several limitations: first, due to the specific enrollment criteria of requiring a high b-value DWI scan and pathologically documented PZ PC, several cases were excluded. Second, in our final population of 85 patients, we used prostatectomy as a pathology reference for the majority of our cases (54/85), but not in all. In 26/85 (30.5%), we relied on TRUS-Bx pathology result, known to have a limitation of under-detection and under-grading csPC [39]. However, in all cases, a repeat TRUS-Bx did not result in up- or down-grading of GS. Many of these patients were on active surveillance and had a GS of 3 + 3, thereby enabling us to evaluate the performance of the diagnostic tools in a population of patients with known PC but that did not require prostatectomy. Third, we did not calculate pathologically-derived tumor volumes for our patient cohort, and as such could not use a pathological definition of csPC ≥ 0.5 cc to compare with GS definitions. Lastly, mpMRI were analyzed by a single reader with more than 10 years of experience in reading prostate mpMRI, and interreader agreement was not evaluated. However, a recent study evaluated interreader agreement of PI-RADS v2 ≥ 4 for detection of clinically significant PZ prostate cancer (GS ≥ 3 + 4), and showed almost perfect agreement (k = 0.91) [40]. Given the known substantial interreader agreement for PI-RADS v2 ≥ 4 for detection of PZ prostate cancer, we deemed the use of multiple radiologists not necessary.

In conclusion, quantitative ADC metrics outperform qualitative PI-RADS v2 in detection of csPC. When csPC was defined as MRI-based tumor volume >0.5 cc, qualitative PI-RADS v2, and quantitative mean ADCT provide similar discrimination. How csPC was defined did have an effect on the diagnostic performance of the ADC metrics, which performed better when csPC was defined by GS. However, the definition of csPC did not affect PI-RADS v2 performance. These findings have implications as we continue to refine PI-RADS and should be considered when planning for the next version of the guidelines (v3) towards improved assessment in the detection of csPC.