Introduction

In recent years, multi-parametric magnetic resonance imaging (mpMRI) has increasingly gained relevance in the detection of prostate cancer [14]. The mpMRI combines anatomic T2-weighted imaging (T2 W) with functional diffusion-weighted imaging (DWI) and dynamic contrast-enhanced (DCE) MRI. Due to the increased number of mpMRI examinations, a standardized interpretation and reporting of mpMRI findings has become necessary. Therefore, in the year 2012 guidelines for the evaluation and reporting of mpMRI, the Prostate Imaging Reporting and Data System (PI-RADS) was introduced by the European Society of Urogenital Radiology (ESUR) [5]. Recently, in December 2014, an updated version of the PI-RADS scoring system has been published by a committee of the ESUR prostate MR imaging expert group and the PI-RADS steering committee of the American College of Radiology [68]. The objective of the new PI-RADS version (v2) is to allow international standardized reporting and differs from version one (v1) in several respects. PI-RADS v1 scores each prostate lesion and each single modality (T2WI, DWI, DCE-MRI) with a five-point scale to evaluate the probability of clinically significant prostate cancer, but the weight of each single modality in the overall PI-RADS score was not clearly defined. So, for structured reporting, a summation of the single modality scores was proposed by several authors [913]. PI-RADS v2 uses a simplified approach. Dominant MRI sequences, which should be scored in the first line, were defined for the peripheral zone (PZ) and the transition zone (TZ). DWI was defined dominant for evaluation of the PZ, T2W for evaluation of the TZ. Further sequences will only contribute to the overall PI-RADS score if in an intermediate finding, more precisely a PI-RADS 3 score in the dominant sequence is found. In this case, a second sequence (DCE-PI-RADS 4 or 5 for the PZ and DWI-PI-RADS 5 for the TZ) can upgrade the lesion score to an overall PI-RADS 4. So the DCE does not contribute to the TZ PI-RADS score, and the T2W does not have relevance in scoring the PZ. Furthermore, in PI-RADS v2, DCE is scored with a binary system instead of a 5-point scale. Focal early enhancement is the only feature that is scored as pathological and no more curve analysis or other complex evaluation of contrast dynamics has to be done. Although it has already been shown that the utilization of PI-RADS v1 improves prostate cancer detection rate accuracy and inter-observer agreement, one major drawback of PI-RADS v1 is the divergent diagnostic performance of the single modalities (DWI, T2, DCE), especially in a separated reporting of PZ and TZ [14]. The limited value of DCE in the evaluation of the TZ and the superior performance of DWI in the PZ has already been reported and a more differentiated weighting system suggested [15, 16]. With PI-RADS v2, a simplified new weighting system, with dominant techniques and a separated reporting for both the TZ and the PZ, was introduced. The purpose of this study was to compare the diagnostic performance of PI-RADS v1 and PI-RADS v2 in the detection of prostate cancer.

Materials and methods

Patients

Multiparametric MRI (mpMRI) data of 50 consecutive patients with biopsy proven PCa, which had originally been evaluated according to PIRADS v1, were now retrospectively re-evaluated, comparing PI-RADS v1 and v2. MpMRI data were evaluated in comparison with histopathological whole-mount step-section slides after prostatectomy. Exclusion criteria originally had been contraindication for mpMRI, or for prostatectomy and poor quality of the histopathological slices. None of the 50 included patients had to be excluded.

Multi-parametric MRI of the prostate

The mpMRI examinations were performed on a 3-T MR scanner (Magnetom Skyra, Siemens AG, Erlangen, Germany) using an 18-channel phased-array body coil. Patients received 20 mg of Butylscopolamine (Buscopan®, Boehringer Ingelheim Pharma, Ingelheim, Germany) intravenously before the examination to suppress peristalsis. The interval between prostate biopsy and mpMRI was at least 3 months to avoid artifacts due to local hemorrhage. The examinations included T2W, DWI and DCE. T2 W turbo spin-echo sequences were acquired in axial plane (TR 4891 ms, TE 101 ms, field of view (FOV) 20.0 cm, image matrix 320 × 320, slice thickness 3 mm, turbo factor 23). Additionally, a T2-weighted 3D turbo spin-echo sequence with variable flip angle (3D SPACE sequence) was used in sagittal orientation. DWI used a single-shot spin-echo echo-planar sequence (TR 6800 ms, TE 67 ms, FOV 21.0 cm, slice thickness 3 mm, image matrix 160 × 132, GRAPPA parallel imaging scheme with acceleration factor 2) with 3 b values (50, 400, and 1000 s/mm2), applying diffusion gradients in three orthogonal directions for each b value. Apparent diffusion coefficient parameter maps were calculated by using the standard mono-exponential model. DCE was performed with a 3D volume-interpolated gradient echo sequence (TR 2.89 ms, TE 1.12 ms, FOV 38.0 × 28.5 cm, slice thickness 4 mm, image matrix 256 × 191, GRAPPA parallel imaging scheme with acceleration factor 2, temporal resolution 7 s). For contrast media, Gadobutrol (Gadovist®, Bayer Schering Pharma, Germany) in a weight-adapted standard dose (0.1 ml/kg body weight) with a power injector (3T Tennessee, Ulrich, Germany) was used. Perfusion curves were generated with the commercially available software TISSUE4D (Siemens AG, Erlangen, Germany) on the MR scanner console. The duration of the whole mpMRI examination was 27 min per patient (with a range of 5 min including time for positioning and preparation of patients).

Whole-mount step-section slides

After prostatectomy the specimen was prepared by an experienced uropathologist with more than 10 years of experience with whole-mount step-section slides. All whole-mount step-section slides were digitized. Slide thickness was 4 mm, and the orientation of the whole-mount step-sections was aligned to the axial T2W images. A correct alignment was ensured by registration of distinct structures on both whole-mount step-sections slides and mpMRI images (seminal vesical insertion, bladder neck, distinct cysts, calcifications, apical urethra). The uropathologist and an uroradiologist together defined how each prostate had to be sliced. Only tumors with over 3 mm maximum diameter were given a Gleason score. Low-grade PCa was defined as Gleason score ≤7a (Gleason 3 + 4), and high-grade PCa was defined as Gleason score ≥7b (Gleason 4 + 3). Apart from categorizing the PCa, the following findings were reported: benign prostatic hyperplasia, prostatitis, regular prostatic tissue, fibrosis, or prostatic intraepithelial neoplasia. Whole-mount specimens were scanned, and the lesions were marked on the scanned images.

Image interpretation

The selection of lesions to be scored had already been done by an experienced uropathologist, and an experienced uroradiologist on histopathological slices at the time the dataset was acquired. They included four lesions from the PZ and two from the TZ (distributed equally on both sides of the gland) for each patient, and a total of 300 lesions were thus selected for scoring in the collective of 50 patients [15]. The data set was retrospectively re-evaluated by two radiologists with a similar level of expertise, >5 years history of consensus in mpMRI reporting within the same institution and more than 2 years of experience in reporting according to the PI-RADS system(>900 cases within the last 2 years) [15]. Reviewer 1 re-evaluated the data using the PI-RADS v1, and reviewer 2 evaluated the data using the new PI-RADS v2. Both were blinded to the other reviewer’s report and the surgical and histopathological findings. The scoring according to PI-RADS v1 followed the recommendation of the ESUR guidelines, assigning a single modality score (1–5) for each lesion and each sequence (T2 W, DWI, DCE) used, and a summation score was applied weighting all sequences equally [5]. PI-RADS v2 scoring was conducted following the recommendation of the consensus group with the concept of dominant techniques using DWI as dominant sequence in the PZ and T2W as dominant sequence in the TZ [6]. Furthermore, in the PZ, DCE imaging results were scored as positive when there was early focal enhancement and as negative when there was none or diffuse enhancement. In PI-RADS v2 no DCE, enhancement curves were analyzed. Finally, the PI-RADS scoring results were compared to histopathological findings for each lesion.

Statistical analysis

Data are presented by absolute and relative counts, only age, PSA and prostate volume are shown with mean and standard deviation (SD). Proportions were tested for differences with Fisher’s exact test, with statistical significance set at two-sided p < 0.05. For both PI-RADS v1 and v2, Receiver Operating Characteristic (ROC) analysis was done to assess the discriminative value of distinguishing between lesions with and without a malignant tumor. The area under the ROC curve (AUC) and the corresponding 95 % confidence interval (CI) were used. The data were performed with SPSS version 19.0.

Results

Patients’ characteristics

The 50 retrospectively included patients had a mean age of 63.0 (SD 8.0) years, mean PSA of 7.3 (SD 4.3) ng/ml, and mean prostate volume of 43.5 (SD 17.8) ml. As previously described, 300 predefined lesions were assessed and revealed 103 (34 %) PCa lesions. Of these PCa lesions, 14 (13.6 %) were located in the TZ and 89 (86.4 %) in the PZ [15]. 30 lesions constituted high-grade PCa (defined as Gleason score ≥7b) all located in the PZ. Histopathological findings and location of the selected 300 lesions are shown in Table 1. There were no significant differences between PI-RADS v1 and v2 regarding PSA values, prostate size, age, time point of mpMRI, or number of pre-biopsies.

Table 1 Histopathological findings and location of the selected 300 lesions shown as absolute numbers and as percentages of the total

Overall PI-RADS score

The tumor detection rates for each PI-RADS score level of PI-RADS v1 and v2 are demonstrated in Fig. 1. Overall PI-RADS v1 showed a large discriminative ability of tumor detection with an AUC of 0.96 (95 % CI 0.94–0.98) in ROC analysis, which is significantly larger (p < 0.001) than the value of PI-RADS v2 with an AUC of 0.90 (95 % CI 0.86–0.94) in ROC analysis (see Fig. 2). When analyzed separately, for PZ lesions PI-RADS v1 showed a significantly larger ability (p < 0.001) of PCa discrimination with an AUC of 0.97 (95 % CI 0.95–0.99) compared to PI-RADS v2 with 0.92 (95 % CI 0.88–0.96). Also for scoring TZ lesions, PI-RADS v1 showed more discrimination with an AUC of 0.96 (95 % CI 0.92–1.00) compared to PI-RADS v2 with 0.90 (95 % CI 0.83–0.97), but the difference was not significant (p > 0.05). Considering PI-RADS score levels 1 and 2 as probably benign and levels 4 and 5 as probably malignant, PI-RADS v2 resulted in significantly more (p < 0.001) false negative results (14 % PCa’s found in score levels 1 and 2), compared to PI-RADS v1 (3 % PCa found in score levels 1 and 2) (see mpMRI PCa example in Fig. 3) (p = 0.79).

Fig. 1
figure 1

Distribution of PI-RADS v1 and v2 overall scores of all prostate cancer lesions (n = 103) in percentage. (v1 = PI-RADS version 1 in blue, v2 = PI-RADS version 2 in red)

Fig. 2
figure 2

ROC curves of overall PI-RADS v1 (in blue) and overall PI-RADS v2 (in green)

Fig. 3
figure 3

Sixty-four-year-old patient with Gleason score 7b prostate cancer in the peripheral zone (PSA 5.9 ng/ml, free PSA 6.8 %). a T2-weighted imaging shows a focal hypointense area in the left peripheral zone (arrowheads) with ill-defined margins, b diffusion-weighted imaging fails to detect a clear focal diffusion restriction in ADC images, c perfusion map shows a focal hypervascular lesion (coded red-yellow, arrow), d T1-weighted imaging shows focal contrast enhancement (arrowhead)

PI-RADS v2 resulted in 80 % (82/103) true positive results (PCa in PI-RADS 4 or 5) and PI-RADS v1 in 82 % true positive results (84/103).

Regarding tumor malignancy, all high-grade PCa lesions (n = 30) occurred at suspicious score levels 4 and 5 when using PI-RADS v1. In comparison, 97 % high-grade PCa lesions (n = 29) occurred at score levels 4 and 5 when using PI-RADS v2. Therefore, one high-grade PCa lesion (Gleason-9) in the PZ was missed (score level 2) when using PI-RADS v2. In detail, this high-grade PCa lesion showed no significant focal diffusion restriction (DWI score 2), but was suspicious on the other sequences. So according to PI-RADS v1, it was scored 2 (DWI) + 4 (T2) + 5 (DCE) = 11, which led to an overall score of 4. According to PI-RADS v2 in the PZ, a low DWI score of 2 overrules the other sequences, which in this case led to a benign overall score of 2.

Non-tumor lesions

197 non-tumor lesions were defined for scoring: benign prostatic hyperplasia, prostatitis, regular prostatic tissue, fibrosis, or prostatic intraepithelial neoplasia. In PI-RADS v2, a significantly higher number (9 %) of histologic benign lesions has been scored as potentially malignant (PI-RADS 4 or 5), compared to PI-RADS v1 (3 %) (p = 0.03).

Discussion

In our study, both PI-RADS v1 and PI-RADS v2 showed high diagnostic accuracy of detecting prostate cancer with mpMRI. Nevertheless, when compared to PI-RADS v1, PI-RADS v2 showed slightly, but significantly, less discriminative ability. Most notably, the comparatively high rate of PCa found in low PI-RADS score levels (scores 1 and 2) when using PI-RADS v2 (14 % and only 3 % with PI-RADS v1) illustrates that PIRADS v2 bares the risk of a higher rate of false negative results, which can result in undertreatment of PCa lesions.

So far only limited data regarding the performance of PI-RADS v2 are available. In our study, PI-RADS v2 achieved an AUC of 0.90 for the whole prostate gland (PZ 0.92 and TZ 0.90), while the old PI-RADS v1 classification achieved an AUC of 0.96 (PZ 0.97 and TZ 0.96). Similar to these results, an inter-observer study by Muller et al. [17] reported an overall AUC in PIRADS v2 of 0.86 for the PZ and 0.87 for the TZ. When compared to our study, Baur et al. [18] achieved a similar AUC of 0.97 for the PZ and a moderately lower AUC of 0.86 for the TZ using PI-RADS v1.

As for the higher rate of false negative results found here with PIRADS v2, one explanation could be that it uses a too simplified approach, defining one dominant MRI technique over the other instead of treating all techniques equally. One of the main changes in PI-RADS v2 is this simplification, especially in DCE. The value of DCE imaging in mpMRI of the prostate is controversially discussed in the literature [15, 1921]. In our study, ROC analysis of DCE in the PZ showed an AUC of 0.92 in PI-RADS v1 and an AUC of 0.86 in PI-RADS v2, implying less discriminative ability. Our findings regarding DCE analysis in the PZ are comparable with the findings of Rosenkrantz et al. [22] who could show that for PZ tumors, a (semi)quantitative assessment of DCE is superior to a qualitative approach. Discriminative ability for PCa detection in the TZ was low for DCE imaging (AUC 0.60) in both PI-RADS v1 and PI-RADS v2. This limited value of DCE in TZ for prostate cancer detection is in-line with findings of Rosenkrantz et al. [16] who detected a higher sensitivity for prostate cancer with DWI, but no further additional benefit using DCE-MRI. Despite the limited value of DCE in the TZ for PCa detection, we found a good overall (PZ and TZ) performance of PI-RADS v1 and v2 in high-grade PCa (Gleason ≥4+3). Nevertheless, there was one missed high-grade (Gleason 9) PCa lesion of the PZ in PI-RADS v2: This lesion did not show a focal diffusion restriction, whereas the other sequences were suspicious. In PI-RADS v2, DWI is the leading sequence for PZ lesions. Therefore, the low DWI score led to an unsuspicious result (PI-RADS 2). In contrary, all single scores are equally weighted in PI-RADS v1, which revealed a suspicious overall PI-RADS score of 4. This demonstrates that in certain cases, it could be problematic to strictly define one leading sequence overruling all others.

Our study had several limitations. First, we have to mention the retrospective nature of this study. We were able to evaluate the discriminative ability of PI-RADS v1 and v2 in 50 patients, but larger multicenter studies are still missing. Also, we evaluated 6 lesions per patient, so per patient, the observations were not completely independent of each other. The study was furthermore performed in a preselected patient population as only patients with known prostate cancer and prostatectomy were included. The high rate of PCa’s in this patient population may have influenced the reviewers’ decisions and may have led to higher detection rates. Furthermore, we did not assess inter-observer agreement, which could be an interesting objective for future studies.

Conclusion

PI-RADS v2 uses a simplified approach, but shows a lower diagnostic accuracy. This could lead to a higher rate of false negative results with the risk of missing tumors within low PI-RADS score levels. Therefore, according to the results of this study, its use cannot be recommended unconditionally and further improvement on the basis of PI-RADS v1 should be considered.