Introduction

Multi-parametric magnetic resonance imaging (mpMRI) of the prostate combines anatomical imaging with at least two functional imaging modalities. The combination of T2-weighted imaging (T2W), diffusion-weighted imaging (DWI) and dynamic contrast-enhanced MRI (DCE) shows high sensitivity and specificity [1] and therefore is increasingly used for the diagnosis of prostate cancer (PCa) [2]. Nevertheless, an impediment to the diagnostic reliability and dissemination of mpMRI is a certain inconsistence in weighting and interpreting findings of the different functional imaging modalities by different radiologists. Therefore, it has been suggested that standardization of interpretation and reporting is accomplished [3]. Therefore, in 2012, the European Society of Urogenital Radiology (ESUR) published guidelines for the evaluation and reporting of mpMRI of the prostate, including the Prostate Imaging Reporting and Data System (PI-RADS) for scoring prostate lesions [4].

Initial studies show promising results concerning the diagnostic accuracy of the PI-RADS system with good inter-reader agreement [5], but better results for the experienced reader [6]. However, as reference standard, most prior studies used a biopsy core-by-core analysis to evaluate the scoring system [79]. In that way, our study group has also evaluated the performance of the PI-RADS system in a previous study, using targeted biopsies of suspicious lesions as reference standard [10]. The main limitation of this study design is that targeted biopsies are mainly taken from suspicious lesions and therefore can only reveal limited histological information about low PI-RADS score levels. The current prospective study of the PI-RADS scoring system, using whole-mount step-section slides as reference standard, allowed evaluation of the whole PI-RADS system for scoring a wide spectrum of pathologies, with inclusion of normal, inflammatory, hyperplastic, and malignant lesions.

So the aim of this study was to evaluate the diagnostic accuracy of the whole PI-RADS spectrum, with inclusion of low suspicious score levels, and to compare the performance of all single modality scores in the peripheral zone (PZ) and the transitional zone (TZ).

Materials and methods

Patients

This prospective study was approved by the local ethics committee. Written informed consent was obtained from each patient. Inclusion criteria were the presence of biopsy-proven PCa, planned radical prostatectomy and the patient’s agreement for the performance of an mpMRI of the prostate prior to surgery. Between January 2012 and December 2013, a series of 55 consecutive patients, who were due for radical prostatectomy, because of biopsy-proven PCa were approached. 50 of these men consented to be included and to undergo mpMRI the day before surgery. Exclusion criteria were as follows: Contraindication for mpMRI, contraindication for prostatectomy, or poor quality of the histopathological slices. None of the 50 included patients had to be excluded.

Multi-parametric prostate MRI

MpMRI examinations were performed on a 3-T MR scanner (Magnetom Skyra, Siemens AG, Erlangen, Germany) using an 18-channel phased-array body coil. To suppress peristalsis, patients received 20 mg of Butylscopolamine (Buscopan®, Boehringer Ingelheim Pharma, Ingelheim, Germany) intravenously before the examination. The interval between prostate biopsy and mpMRI was at least 3 months to avoid artifacts, caused by local hemorrhage.

MpMRI included T2-weighted imaging (T2W), diffusion-weighted imaging (DWI), and dynamic contrast-enhanced imaging (DCE). T2W turbo spin-echo sequences were acquired in axial plane (TR 4,891 ms, TE 101 ms, field of view (FOV) 20.0 cm, image matrix 320 × 320, slice thickness 3 mm, turbo factor 23). Additionally, a T2-weighted 3D turbo spin-echo sequence with variable flip angle (3D SPACE sequence) was used in sagittal orientation.

DWI used a single-shot spin-echo echo-planar sequence (TR 6,800 ms, TE 67 ms, FOV 21.0 cm, slice thickness 3 mm, image matrix 160 × 132, using a GRAPPA parallel imaging scheme with acceleration factor 2) with 3 b-values (50, 400, 1,000 s/mm2), applying diffusion gradients in three orthogonal directions for each b-value. Apparent diffusion coefficient parameter maps were calculated by using the standard mono-exponential model.

DCE was performed with a 3D volume-interpolated gradient echo sequence (TR 2.89 ms, TE 1.12 ms, FOV 38.0 × 28.5 cm, slice thickness 4 mm, image matrix 256 × 191, GRAPPA parallel imaging scheme with acceleration factor 2, temporal resolution 7 s). For contrast media, Gadobutrol (Gadovist, Bayer Schering Pharma, Germany) in a weight-adapted standard dose (0.1 ml/kg body weight) with a power injector (3T Tennessee, Ulrich, Germany) was used. Perfusion curves were generated with the commercially available software TISSUE4D (Siemens AG, Erlangen, Germany) on the MR scanner console.

The duration of the whole mpMRI examination was approximately 27 min.

Whole-mount step-section slides

Following prostatectomy, the prostate gland was prepared by an experienced uro-pathologist of more than 10 years experience as whole-mount step-section slides. All whole-mount step-section slides were digitized. Slide thickness was 4 mm, and the orientation of the whole-mount step-sections was aligned to the axial T2W images. To ensure this, the uro-pathologist and one uro-radiologist defined together, how each prostate had to be sliced according to the MRI images of each patient, and registration of distinct structures, which could be identified on whole-mount step-sections slides and mpMRI datasets (seminal vesical insertion, bladder neck, distinct cysts, calcifications, apical urethra) was done. Only tumors with over 3 mm maximum diameter were given a Gleason score. Low-grade PCa was defined as Gleason score ≤7a (Gleason 3 + 4), and high-grade PCa was defined as Gleason score ≥7b (Gleason 4 + 3).

PCa was reported with the maximum extension and assigned Gleason score. Apart from PCa, the following changes were reported: benign prostatic hyperplasia, prostatitis, healthy tissue, fibrosis, or prostatic intraepithelial neoplasia. Whole-mount specimens were scanned and the lesions were marked on the scanned images.

Image interpretation

Malignant and non-malignant lesions to be scored were predefined by an uro-pathologist together with an experienced uro-radiologist (Radiologist 1—more than 5 years’ experience and specialist in prostate MR interpretation) on histopathological slices and mpMRI images. The selection included four lesions from the PZ and two from the TZ (distributed equally on both sides of the gland) in each patient. Overall, 300 lesions were selected for scoring in the collective of 50 patients. Lesions were selected to represent a wide spectrum of malignant, tumor mimicking, and benign pathologies as well as normal tissue within one prostate, always including the most malignant lesion.

Subsequently, the defined lesions in each patient were scored by a second radiologist (Radiologist 2—in excess of 4 years’ experience in prostate MR interpretation), who was blinded to the surgical and histopathological findings and was unaware how many lesions had been assigned to carcinomas, benign pathologies, or normal tissue. Scoring followed the recommendation of the ESUR guidelines assigning a single modality score (1–5) for each lesion in each sequence (T2W, DWI, DCE) used. Additionally, an overall PI-RADS score was given for each lesion, reflecting the probability of high-grade PCa [11]. The PI-RADS scoring results were compared to histopathological findings for each lesion.

Statistical analysis

The data are shown with absolute and relative numbers of distributions as appropriate, the patient characteristics with means and standard deviations (SD). To test for differences in dichotomized score values, Fisher’s exact test was used, where two-sided values of p < 0.05 were considered as statistically significant. Receiver Operating Characteristic (ROC) analysis was performed to assess the discriminative value of distinguishing between lesions with and without a malignant tumor, measured with the area under the ROC curve (AUC) and the corresponding 95 % confidence interval (CI). The data were analyzed with SPSS 19.0.

Results

Patients’ characteristics

Patients characteristics of the 50 prospectively included patients were as follows: mean age 63.0 (SD 8.0) years, mean PSA 7.3 (SD 4.3) ng/ml, and mean prostate volume 43.5 (SD 17.8) ml, respectively.

Three hundred different lesions to be scored were predefined by an independent reader, including 103 prostate PCa lesions. 14 of these PCa lesions were located in the TZ and 89 PCa lesions in the PZ. 30 lesions constituted high-grade PCa (defined as Gleason score ≥7b) all located in the peripheral zone. Table 1 gives an overview of all selected lesions.

Table 1 Histopathological findings and location of the selected 300 lesions shown as a percentage of the total

Single modality scores

Irrespective of the localization of lesions, the single modality scores showed good association with PCa detection rates in ROC analysis, however to a different degree. The T2W and DWI single scores revealed significantly higher area under the curve (AUC) values with 0.90 (95 % CI 0.87–0.94) and 0.92 (95 % CI 0.88–0.95), respectively, when compared to DCE with an AUC of 0.85 (95 % CI 0.80–0.90).

In the PZ, T2W, DWI, and DCE all showed similar good associations with PCa detection rates with AUCs of 0.91 (95 % CI 0.87–0.95), 0.92 (95 % CI 0.88–0.96), and 0.92 (95 % CI 0.88–0.96), respectively.

In the TZ, T2W, and DWI, single scores also revealed high AUCs with 0.90 (95 % CI 0.83–0.97) and 0.96 (95 % CI 0.91–1.00), respectively. In comparison, the DCE had a significantly lower AUC of 0.60 (95 % CI 0.43–0.76). Figure 1 shows the detection rates of all three single modalities in the PZ compared to the detection rates in the TZ, demonstrating a nearly equal performance of the single modality scores in the PZ. In the TZ, the detection rates of DCE clearly lag behind the other single modality scores.

Fig. 1
figure 1

Detection rate of prostate cancer (PCa) lesions in the transitional zone (TZ) (100 lesions) compared to the peripheral zone (PZ) (200 lesions), in percentages assigned to the different single modality scores for T2-weighted imaging (t2), diffusion-weighted imaging (dwi), and dynamic contrast-enhanced imaging (dce)

Overall PI-RADS score

The overall PI-RADS score for each lesion ranged from score 1 to score 5. An overall PI-RADS score of 1, 2, 3, 4, and 5 mapped to 0, 3, 29, 92, and 100 % PCa detection, respectively.

Irrespective of the localization of lesions, the summation of all three single modality scores showed a high discriminative ability of tumor detection with an AUC of 0.97 (95 % CI 0.95–0.99) in ROC analysis, which is significantly higher than the AUC values of each single modality when analyzed separately. In accordance, only 3 % of benign lesions have been scored with PI-RADS 4 or 5, whereas 83 % of malignant lesions had such high PI-RADS score levels.

Regarding tumor aggressiveness, it was noticeable that there were no high-grade PCa lesions with overall PI-RADS scores less than 4 and only one PCa with a Gleason score of 7a marked as overall PI-RADS 3. Stated differently, 100 % of high-grade PCa lesions were found at score levels of 4 or 5. On the other hand, there was no PCa lesion with a Gleason score of 6 assigned to an overall PI-RADS score of 5 (Table 2; Fig. 2).

Table 2 Gleason score distribution of all prostate cancer lesions per overall PI-RADS score, shown in percentage of all prostate cancer lesions with the corresponding score
Fig. 2
figure 2

Incidence of prostate cancer (PCa) lesions (n = 103) and high-grade PCa lesions (n = 30) in percentages of all lesions assigned to the different overall PI-RADS scores. Absolute numbers are given in brackets. No high-grade PCa lesion was assigned to an overall PI-RADS score of one to three

Non-tumor lesions

Overall, 197 lesions were defined for scoring: benign prostatic hyperplasia, prostatitis, healthy tissue, fibrosis, or prostatic intraepithelial neoplasia. The distribution of non-tumor lesions per overall PI-RADS score is shown in Table 3. The most important findings shown in this table are that normal tissue, with neither benign nor malignant changes, has been found primarily at score levels of 1 and 2 and never at score levels of 4 and 5. Benign pathologies have been found primarily at score levels 2 and 3.

Table 3 Distribution of tumor and non-tumor lesions per overall PI-RADS score, shown in percentage of all lesions with the corresponding score

The proportion of scores ≥4 for the non-tumor lesions was significantly lower than that of the PCa lesions in the PZ for each single modality score (p < 0.001). In the TZ, this also applied to T2W and DWI, but for DCE, the difference was only small with 7 and 14 %, respectively (p = 0.31).

It was not possible to significantly differentiate the different non-tumor pathologies from each other using the overall PI-RADS score.

Discussion

The PI-RADS scoring system was published by the European Society of Urogenital Radiology (ESUR) in 2012 to standardize reporting of mpMRI of the prostate [4]. Our data show that the PI-RADS scoring system is feasible to differentiate PCa from non-tumor lesions. The higher the overall PI-RADS score, the higher the probability of PCa. Moreover, there is an association between the overall PI-RADS score and the Gleason score: High-grade PCa (defined as Gleason score ≥7b and ≥3 mm in maximum diameter) received significantly higher scores than low-grade PCa. Therefore, the PI-RADS scoring system is feasible not only for PCa detection, but also as a means of risk stratification regarding PCa aggressiveness. This is in line with the finding of Hoeks et al. [12], who applied the PI-RADS scoring system in active surveillance.

Likert-type score systems—like PI-RADS—have often been used to evaluate prostate MRI [3]. The PI-RADS scoring system has been developed in accordance with the established scoring system for breast cancer diagnostics BI-RADS, developed by the American college of radiology to standardize reporting of mammograms [13]. Like BI-RADS, the PI-RADS scoring system includes an intermediate category (overall PI-RADS 3) separating the benign findings (overall PI-RADS 1 or 2) from clearly malignant findings (overall PI-RADS 4 and 5). For the BI-RADS score 3, several studies demonstrated a lower than 2 % chance for malignancy in this category, mostly managed by follow-up [13]. Our data show that no high-grade PCa received an overall PI-RADS score lower than 4. The only PCa lesions scored with a overall PI-RADS score of 2 or 3 had a Gleason score of 6, except for one PCa which had a PI-RADS score of 3 and revealed a Gleason score of 7a. Around a quarter of patients with unsuspicious MRI-examinations are known to have PCa on standard biopsy, but 97.7 % of these PCa lesions invisible on prostate MRI are low-grade PCa [14]. Unlike breast cancer diagnostics, prostate MRI is not intended to detect every tumor, but only significant disease [15], especially given the fact that PCa is seen in a high proportion of men, but kills only a minority [16]. Pure Gleason score 6 cancer very rarely results in death from PCa [16]; similarly, tumors <0.5 cc are not likely to be lethal [17]. As the British consensus paper states multi-parametric MRI can usually (with >80 % sensitivity) detect the following, in both peripheral and transition zones (note that for a spherical tumor, 0.2 cc is a diameter of 7 mm and 0.5 cc is 10 mm) (a) 0.2 cc of Gleason score 7b or above and (b) 0.5 cc of Gleason score 7a or above [18]. This study embraces the aim of the PI-RADS score, which is to rate the likelihood of a high-grade PCa [4], and to avoid unnecessary morbidity from treatment of disease best left alone.

When dealing with the single modality scores separately, DWI has been described to be the best performing imaging modality in the peripheral zone [19]. In our study, there was a comparable high reliability of each single modality scores for PCa detection in the PZ. However, in the TZ, single modality scores differed substantially. With an AUC of 0.60 in ROC analysis, it is doubtful whether the DCE has a diagnostic value for PCa detection in the TZ at all. In line with the present data, a degree of overlap for DCE in differentiating PCa from benign prostatic hyperplasia has been described [20], making DCE the least specific sequence (Fig. 3).

Fig. 3
figure 3

A 67-year old patient with an anterior located Gleason score 7b prostate cancer in the transitional zone (PSA 5.69 ng/ml), encircled on the whole-mount step-section slide (a): The carcinoma (arrows) shows low signal on T2-weighted images with ill-defined margins (b), diffusion restriction on diffusion-weighted imaging (c, d), and hyperperfusion on dynamic contrast-enhanced MRI (e: red circle on perfusion map) with a focal plateau curve (f: red curve). An area with hyperplastic nodules appears unsuspicious in T2-weighted images (b), diffusion-weighted imaging (c, d), but shows hyperperfusion on dynamic contrast-enhanced MRI (e: green circle on perfusion map) with a focal washout curve (f: green curve), which is even more pathological (PI-RADS 5) than the perfusion of the carcinoma

Previous studies describe poorer performance of the PI-RADS scoring system in the TZ compared with the PZ [21] as also found here. Our data suggest that the PI-RADS scoring system has to be improved regarding the DCE in the TZ, possibly by using a different description for the certain scores in the PZ and the TZ, as has been done for T2-weighted imaging.

One of the debated aspects of the PI-RADS scoring system is the missing consensus description how the overall PI-RADS score should be generated from the single modality scores [22]. Roethke et al. [11] proposed a method to generate a combined score from the different modality scores and to use certain limits to generate the overall PI-RADS score, following an algorithm-based approach. Several other studies also used a PI-RADS sum score, which equally weights all single modality scores [810, 21]. According to our data, this approach seems to be appropriate for scoring in the PZ, since in the PZ all single modality scores show similar good diagnostic accuracy. However, for scoring the TZ, it seems problematic, because of the doubtful results for the DCE single modality score.

This study is limited by the fact that all included patients were scheduled for radical prostatectomy because of a biopsy-proven PCa. This bias may explain the high-PCa incidences in PI-RADS category 4 when compared to previous studies. However, without this inclusion criterion, it would not be possible to guarantee whole-mount step-sections slides for gold standard in every patient. To minimize this bias, the uro-pathologist and uro-radiologist, who selected the lesions to be scored, tried to represent the whole variety of pathologies: malignant lesions, tumor mimicking benign pathologies, and normal tissue. Additionally, lesions to be scored were predefined by the uro-pathologist together with radiologist 1. Therefore, the scoring radiologist (radiologist 2) mainly focused on these predefined regions. This approach was necessary to ensure scoring of the whole spectrum of benign and malignant lesions, but is slightly different from routine practice and may affect the results. Inter-observer agreement was not part of this study, but has already been evaluated for the PI-RADS scoring system [5, 6]. Multicenter studies would be desirable to confirm these results with larger patient collectives.

In conclusion, the PI-RADS scoring system can be used for PCa detection and to predict PCa aggressiveness: An overall PI-RADS score of 1 and 2 primarily revealed benign changes, and in PI-RADS 3, there was a certain amount (29 %) of low-grade PCa, while only high-grade cancer scored PI-RADS 4 or 5. However, according to our data, it seems necessary to revise the PI-RADS scoring system concerning the DCE in the TZ.