Introduction

Despite ongoing concerns about overdiagnosis and overtreatment, prostate cancer is still the most common cancer diagnosed in Western men, with more than twice as many new cases in 2014 compared to its nearest contender, lung cancer [1]. However, most cancers diagnosed are indolent small-volume, low-grade tumours that are unlikely to result in significant morbidity or mortality [2]. As a result, there is increasing emphasis on a diagnostic strategy geared towards detecting only “clinically significant” tumours; such tumours are often defined as those with a pathological volume ≥ 0.5 mL, although other definitions, including the presence of any cancer with a Gleason Score ≥ 4+3, have also been proposed [3, 4].

The use of multiparametric magnetic resonance imaging (mp-MRI) in the assessment of prostate cancer, incorporating anatomical (T1- and T2-weighted) and “functional” MRI sequences such as diffusion-weighted (DW) and dynamic contrast-enhanced (DCE) MRI, has grown steadily in recent years and is now part of the standard clinical pathway in many institutions. However, prostate mp-MRI has also been extensively criticized for its widely variable reported diagnostic performance across different healthcare settings, with some suggesting that its usefulness may be restricted to a small number of academic centres with substantial expertise in the technique. In 2012, the Prostate Imaging Reporting and Data System (PI-RADS) was introduced [5]. In the original PI-RADS, a suspicion score for the presence of clinically significant cancer was assigned on a 1–5 scale on each mp-MRI sequence (T2WI, DW-MRI, DCE-MRI); however, the guideline did not specify how to integrate the scores to generate a single level of suspicion for each lesion on mpMRI. In addition to detailed recommendations on integrating mp-MRI scores according to prostate zonal anatomy (Table 1), the updated PI-RADSv2 [6], published in 2015, also suggested a simplified approach for the DCE-MRI interpretation scheme (dichotomous positive or negative instead of the 1–5 scale suggested in PI-RADSv1) (Table 2), and included a pathologic definition of clinically significant prostate cancer, which should be used for comparison to mp-MRI. While the PI-RADS efforts to standardize prostate mp-MRI reporting is highly laudable, there is acknowledgement that the guidelines are still work in progress and that some of the recommendations made are based on expert opinion and would require evidence-based support and refinement. The purpose of this study was to evaluate the recommendations for mp-MRI interpretation introduced in the recently updated PI-RADSv2 and investigate the impact of pathologic tumour volume on prostate cancer (PCa) detectability on mp-MRI.

Table 1 Integration of MRI scores from T2-weighted images (T2W), diffusion-weighted images (DWI), and dynamic contrast-enhanced (DCE) images from the prostatic peripheral zone (PZ) and transition zone (TZ) recommended by PI-RADS v2 [6]
Table 2 Summary of MRI features on T2-weighted images (T2WI), diffusion-weighted images (DWI), and dynamic contrast-enhanced (DCE) images evaluated in the PI-RADS v2 scoring system [6]

Methods

Patients

Our institutional review board approved this retrospective study and waived the requirement for informed consent. Inclusion criteria were: (i) radical prostatectomy for biopsy-proven prostate cancer performed at our institution, (ii) whole-mount step-section pathological maps for tumour localization, and (iii) multiparametric endorectal 3-Tesla MRI performed at our institution ≤180 days before surgery. Inclusion criteria were fulfilled by 187 patients. Eight patients were excluded due to prior treatment (hormonal therapy/radiation). A total of 195 tumours ≥0.5 mL (any Gleason grade) and 41 tumours <0.5 mL (with a predominant Gleason grade ≥4) were identified in the whole-mount pathology specimen of the remaining 179 patients. Seventeen tumours were excluded from analysis of DW-MRI due to artefacts/distortions, and 14 tumours were excluded from analysis of DCE-MRI because the tumour was not fully covered (n=6) or because of deviations from the MRI protocol, e.g., with respect to timing of contrast injection or temporal resolution (n=8). The final cohort included 206 tumours in 150 patients (169 tumours ≥0.5 mL of any Gleason grade, and 37 tumours <0.5 mL with a predominant Gleason grade ≥4). One hundred and twenty patients had tumours ≥ 0.5 mL only, 18 patients had both tumours ≥ and < 0.5 mL, and 12 patients had tumours < 0.5 mL only.

All patients were included in a prior report [7] evaluating the associations between quantitative Haralick texture features derived from MRI and pathologic findings.

Multiparametric MRI acquisition

All MRI examinations were performed on 3-Tesla MRI systems (GE Healthcare, WI, USA) using a multichannel phased-array coil and an endorectal coil. The MRI acquisition protocol included T2-weighted imaging (T2WI) in axial, coronal and sagittal planes (fast-spin echo sequences, TR/TE 2500–7700/83.3–143.5 ms; section thickness, 3–4 mm; intersection gap, 0–1 mm; field of view, 14–24 cm; and matrix, 288 × 288 to 448 × 224), diffusion-weighted MRI with b-values of 0 and 1000 s/mm2 (single-shot spin-echo EPI sequence; TR/TE 3500–5600/70.3–105.6 ms; slice thickness: 3 mm, no interslice gap; field of view: 14–24 cm; matrix: 128 × 128), and a T1-weighted dynamic contrast-enhanced sequence (TR/TE=3.6–4.9/1.3–1.7 ms; slice thickness: 5 mm, no interslice gap; field of view: 24 × 24 cm; matrix: 256 × 128–160, mean temporal resolution: 10 s). Parametric ADC maps were generated from diffusion-weighted images on a voxel-wise basis using a monoexponential model. For DCE-MRI, gadopentetate dimeglumine (Magnevist; Berlex Laboratories, Montville, NJ, USA) was administered (0.1 mmol of per kilogram of body weight at a rate of 2 mL/s) using an automatic injector (Medrad, Indianola, IA, USA).

Histopathological preparation and correlation to imaging

Prostatectomy specimens were fixed in formalin and sliced at intervals of 3–5 mm. Haematoxylin and eosin staining was performed after paraffin embedding. Tumours were outlined on each slice using a marking pen, and every slide was then digitized using a digital photo scanner (300 dpi resolution, Seiko Epson Corporation, Nagano, Japan). For each tumour focus, Gleason grades and volume were determined.

For analysis, each tumour was first identified using the whole-mount pathology as guidance and analyzed in consensus by two readers, one of them a fellowship-trained genitourinary radiologist with 8 years of experience in prostate MRI interpretation (HAV) and the other a genitourinary radiology research fellow (AMH). Both readers were blinded to all clinical information. For each tumour focus, PI-RADS v2 scores for T2-weighted, diffusion-weighted and dynamic contrast-enhanced MRI were recorded [6]. All assessments were made on a commercial PACS workstation (Centricity PACS, GE Healthcare, WI, USA).

Statistical methods

Continuous variables were summarized using medians and ranges, and categorical variables were summarized using frequencies and percents. PI-RADS v2 scores on each mp-MRI sequence were tabulated and integrated as suggested by the guideline [6].

The proportions of lesions identified on pathology that were considered positive by PI-RADS were calculated along with the modified Clopper-Pearson confidence intervals, adjusted for multiple measurements per patient. The 37 lesions <0.5 cc and GS ≥4+3, and 42 lesions ≥0.5 cc and GS ≥4+3 were used in a subanalysis to compare the proportion of positive cases in lesions ≥0.5 cc and GS ≥4+3 to those lesions <0.5 cc and GS ≥4+3 using the Rao-Scott chi-square test.

P-values less than 0.05 were considered statistically significant. Analyses were performed using SAS 9.4 (SAS Institute, Cary, NC, USA).

Results

Patient characteristics

The final cohort included 206 tumours identified on whole-mount pathology of 150 patients. One hundred and sixty-nine tumours were ≥ 0.5 mL (125 in the peripheral zone [PZ] and 44 in the transition zone [TZ]), and 37 tumours were < 0.5 mL, but had a predominant Gleason grade ≥4 (27 in the PZ and 10 in the TZ). The median time between MRI and surgery was 15 days (range: 1–112 days). The pathologic characteristics of the tumours ≥ 0.5 mL are summarized in Table 3.

Table 3 Summary of histopathologic characteristics of prostate tumour foci ≥ 0.5 mL

Evaluation of the PI-RADS v2 mpMRI integration scheme for tumours ≥0.5 mL on pathology

Peripheral zone (PZ)

The PI-RADS v2 integration scheme for the scores derived from T2WI, DW-MRI and DCE-MRI resulted in the correct classification of 118 of 125 (94 %, 95 %CI: 90–99 %) PZ tumours (Fig. 1). DCE-MRI was analyzed in all 125 tumours and provided additional value for the detection 4/125 tumours, all of which had a score of 3 on DW-MRI and were positive on DCE-MRI. Not incorporating DCE-MRI findings would have resulted in the correct classification of 114/125 (91 %, 95 %CI: 86–97 %) of PZ tumours. A summary of these results is presented in Table 4.

Fig. 1
figure 1

67-year-old male patient with a dominant right peripheral zone prostate tumour > 0.5 mL identified on the whole-mount step section pathology (green area in a), PRADS v2 score of 5 on the ADC map (b), score of 2 on the T2-weighted image (c), PI-RADS v2 negative on DCE-MRI (d). An integrated PI-RADS v2 score of 5 was assigned

Table 4 Summary of PI-RADSv2 integrated interpretation interpretation scheme for pathologically confirmed prostate tumours according to findings on T2-weighted images (T2WI), diffusion-weighted images (DWI), and dynamic contrast-enhanced (DCE) images

Transition zone (TZ)

The PI-RADS v2 integration scheme for the scores derived from T2WI, DW-MRI and DCE-MRI resulted in the correct classification of 42 of 44 (95 %, 95 %CI: 89–100 %) TZ tumours. The two tumours not detected on T2WI or DW-MRI were negative on DCE-MRI. One TZ tumour had a score of 2 on T2WI and another had a score of 3 on T2WI, but ≤4 on DW-MRI. A summary of these results is presented in Table 4.

Impact of pathologic tumour volume on prostate cancer detection on mp-MRI

When considering tumours of any Gleason Score that were ≥ 0.5 mL in pathologic volume, most tumours had a PI-RADS v2 score of 4 or 5 on T2WI (128/169, 76 %; 95 %CI: 69–83 %), DW-MRI (151/169, 89 %; 95 %CI: 85–94 %), and were positive on DCE-MRI (121/169, 72 %; 95 %CI: 64–79 %). In 163/169 tumours ≥0.5 mL identified on pathology (96 %, 95 %CI: 94–99 %), at least one mpMRI sequence demonstrated a positive finding (PI-RADS score of 4 or 5). On the other hand, over half of the Gleason Score ≥4+3 tumours that were <0.5 mL on pathology were not identified (PI-RADS score of 1 or 2) on T2WI (28/37), DW-MRI (27/37), and DCE-MRI (34/37). The PI-RADSv2 integrated scores correctly identified 7/27 (26 %; 95 %CI: 10–42 %) PZ and 2/10 (95 %; 95 %CI: 0–52 %) TZ tumours with a GS ≥4+3, but <0.5 mL. DCE-MRI did not aid the detection of any of the 27 PZ tumours <0.5 mL. The proportion of positive and negative tumours according to PI-RADS v2 cases were significantly different between GS ≥4+3 <0.5 mL vs. GS ≥4+3 tumours ≥0.5 mL (P<0.0001 for T2WI, DW-MRI, and DCE-MRI) (Table 5).

Table 5 Comparison of PI-RADS v2 scores on T2-weighted images (T2WI), diffusion-weighted images (DWI), and dynamic contrast-enhanced (DCE) images between Gleason 4+3 tumours <0.5 mL and ≥ 0.5 mL

Discussion

In this study we assessed the PI-RADS v2 recommendations for the integration of the scores assigned to the same prostate lesion on different mp-MRI sequences and found that applying PI-RADSv2 resulted in the correct classification of 118/125 PZ tumours (94 %) and 42/44 TZ tumours (95 %) ≥0.5 mL on pathology. Variable results regarding diagnostic accuracy have been reported with the use of PI-RADS v1, possibly at least partially because in its original version, PI-RADS did not provide insight regarding how best to integrate scores or interpret discrepancies between scores for the same lesion on multiple mp-MRI sequences. A meta-analysis of 14 studies found wide variability in the interpretation and integration of PI-RADS v1 scores [8]. Six of the 14 studies reported a single PI-RADS score per patient on a 1–5 scale (three without providing details of how the single score was derived) interpreted using a score of 3 or 4 as a cutoff to define positivity for tumour; on the other hand, eight studies used the “sum” scores of all three sequences and considered summed scores between 8 and 10 positive [8]. Clarifications regarding integration may also contribute to increasing the diagnostic accuracy of PI-RADS v1. Two studies have included head-to-head comparisons of the PI-RADS v1 and other, more loosely defined suspicion scoring systems such as the Likert scale, and none have identified an improved diagnostic performance when using PI-RADS v1 [9, 10]. They reported that the Likert scale performed better than PI-RADS v1 in certain anatomical areas and for more experienced radiologists [9, 10]. These results are not necessarily unexpected, as in essence the PI-RADS and Likert scales are very similar in nature, and actually, the benefit of PI-RADS might lie not in improving the performance of experienced radiologists, but rather in expanding the use of prostate mp-MRI by providing a scaffold on which less experienced radiologists could base their interpretations.

Although not the primary aim of our study, we also identified a limited contribution of DCE-MRI to the information provided by T2WI and DW-MRI. DCE-MRI provided incremental value for the classification of only 4/125 PZ tumours ≥0.5 mL (3 %) and none of the 27 TZ tumours <0.5 mL with GS ≥4+3, and therefore, the minimal added benefit derived from DCE-MRI might be trumped by the drawbacks associated with its use (e.g. risk of side effects from gadolinium-based contrast agents, additional scan time, and additional costs).

An important caveat to our findings is the impact of pathologic tumour volumes on the performance of PI-RADS v2. The majority of Gleason Score ≥4+3 tumours with volumes <0.5 mL on pathology were not detectable on T2WI (28/37), DW-MRI (27/37), and DCE-MRI (34/37), and PI-RADS v2 was only able to identify 9/37 (24 %) of these tumours. This is not surprising as lesion detectability on MRI has been shown to be volume-dependent [11], but is nonetheless important to consider the limited ability of mpMRI to detect small lesions. In particular since PIRADSv2 states that “a major objective of a prostate MRI exam is to identify and localize abnormalities that correspond to clinically significant prostate cancer, and mpMRI is able to detect intermediate to high grade cancers with volumes <5 mm…” In fact, the definition of clinically significant cancer in PIRADSv2 includes Gleason 7 tumours smaller than 5 mm.

Our study had several limitations. It was a retrospective analysis biased by the inherent limitations of such a study design; however, considering the PI-RADSv2 was only published in April 2015, it will probably take some time before a prospective data for its evaluation is collected and published. Only patients undergoing prostatectomy were included; however, there are no reasons to assume the findings would not apply to patients managed with other approaches, and the use of whole-mount step-section pathology maps allowed for a detailed imaging to pathology correlation. The readers were aware of the presence and location of the tumours on pathology; in fact, all tumours were first identified on the pathology maps and then cognitively co-registered to the MRI, using anatomical landmarks for reference. Therefore, this was not considered a diagnostic accuracy study. Our findings probably represent close to the “best achievable”, with a lower accuracy expected when evaluating mp-MRI blinded to the pathology data and with the introduction of other factors such as reader experience. This design was chosen in order to provide an objective evaluation of the MR abnormalities at the sites of known tumours, but we acknowledge it did not allow the assessment of inter-reader agreement and false positive abnormalities on MRI. Although important, such assessment was beyond the scope of our study.

Conclusion

In summary, we found that the integrated scores suggested by PI-RADS v2 resulted in the correct classification of 94–95 % of tumours ≥0.5 mL, but was limited for the assessment of GS ≥4+3 tumours ≤ 0.5 mL, and DCE-MRI offered limited added value to T2WI+DW-MRI.