Introduction

Prostate cancer is the second most frequently diagnosed cancer worldwide (13.6 % of all diagnosed cancers) and the third most lethal cancer in men in the developed world [1]. However, the detection rate of prostate cancer has been found to be only around 25 % when detection is based on elevated prostate-specific antigen (PSA), suspicious PSA kinetics and digital rectal examination [2, 3]. Advances in multiparametric magnetic resonance imaging (mp-MRI) combining anatomical and functional data showed considerable advantages in the detection and characterisation of prostate cancer [4, 5]. Several studies have demonstrated that functional imaging techniques, such as diffusion-weighted imaging (DWI), dynamic contrast-enhanced MRI (DCE-MRI) and spectroscopic imaging (MRSI) clearly improve the accuracy of MRI for the detection and localisation of prostate cancer [6]. Various MRI protocols have been proposed [710].

Recently the European Society of Urogenital Radiology (ESUR) published prostate MRI guidelines in order to standardise the evaluation and reporting of prostate MRI [11]. One relevant part of these guidelines is a unified scoring system named PI-RADS (Prostate Imaging Reporting and Data System) comparable to the breast imaging reporting and data system (BI-RADS) [12, 13]. While this guideline is the first attempt to standardising prostate MRI, little to no evidence has been available on its accuracy and inter-reader agreement.

Thus, the purpose of this study was to assess the inter-reader agreement of the ESUR scoring system using histology obtained from MRI-guided biopsy as the reference standard.

Materials and methods

Patients

The study was approved by the local ethics committee. Between August 2011 and April 2012, 67 consecutive patients (mean age 66.8 ± 7.5 years, mean prostate volume 57 ± 26.1 ml, mean PSA value 10 ± 7.6 ng/ml) with increased prostate-specific antigen (PSA) levels (above 4 ng/ml) and at least one negative trans-rectal ultrasound (TRUS)-guided biopsy were included in this study. Each patient underwent mp-MRI for assessment of the prostate. In a second session MRI-guided biopsy of all described lesions—including suspicious and unsuspicious findings (n = 164 lesions in total)—was performed.

MRI protocol

Using 3-T MRI (Magnetom Trio; Siemens Healthcare, Forchheim, Germany) an mp-MRI of the prostate was performed with a six-channel phased-array body coil. To suppress bowel peristalsis all patients received 20 mg butylscopolamine (Buscopan; Boehringer, Ingelheim, Germany) intravenously and intramuscularly. An interval of at least 6 weeks was maintained between mp-MRI and the preceding TRUS biopsy. Mp-MRI of the prostate included T2-weighted imaging (T2WI), T1-weighted imaging (T1WI), DWI and DCE-MRI. T2-weighted turbo spin echo sequences were acquired in three standard orthogonal planes (axial, sagittal and coronal). Axial T1-weighted turbo spin echo images, single-shot spin echo echo-planar sequence using five b values (0, 250, 500, 750, 1,000 s/mm2) with five averages for DWI and volume-interpolated gradient echo sequence for the DCE-MRI were applied (Table 1). The imaging protocol was adapted according to the ESUR guidelines.

Table 1 MRI protocol for T2-weighted imaging (T2WI), T1-weighted imaging (T1WI), diffusion-weighted imaging (DWI) and dynamic contrast enhanced imaging (DCE-MRI)

Scoring system

The ESUR guidelines recommend a standardised scoring system for evaluation and reporting of prostate MRI similar to the BI-RADS classification used by breast radiologists for X-ray mammography, breast ultrasound and breast MRI [12, 13]. The ESUR guidelines endorse a division of the prostate gland into 27 regions (minimum 16 regions). All lesions are rated on a score from 1 to 5 in each of the three MRI sequences (T2WI, DWI, DCE-MRI). For evaluating T2-weighted data sets, the location of the lesion either in the peripheral zone or the central zone has to be considered (Table 2) [11].

Table 2 Evaluation of each MRI sequence (T2WI, DWI, DCE-MRI) according to the PI-RADS (Prostate Imaging Reporting and Data System) score [11]

Scoring

Lesions (n = 164) were retrospectively evaluated by three blinded readers (D.B., M.Q. and L.S., with 4, 3 and 2 years of experience in reading prostate MRI, respectively) comprising the different MRI sequences (T2WI, DWI, DCE-MRI). Scoring was performed according to the ESUR guidelines (PI-RADS). Additionally, each lesion was given an overall score (3–15 points). All readers evaluated each lesion separately and were blinded with respect to the patients’ clinical data and the histology of the corresponding MRI-guided biopsy. All lesions were marked by a circle on the PACS workstation before starting the study evaluation (Fig. 1). Lesion documentation used a 27-region localisation scheme [4].

Fig. 1
figure 1

Example of a prostate MRI evaluation. a Axial T2WI with a suspicious peripheral lesion located in the right peripheral zone (marked with a circle); b coronal T2WI; c, d corresponding apparent diffusion coefficient (ADC) map showing a reduced signal and diffusion-weighted imaging (DWI) on high b value (1,000 s/mm2); e, f related dynamic contrast enhanced (DCE)-MRI with steep initial slope of contrast media uptake followed by a quick washout (type 3 curve). Histological result of this lesion was a tumour with a Gleason score of 4 + 3 = 7

In-bore MRI-guided biopsy

The MRI-guided biopsies were performed on the same 3-T system (Magnetom Trio; Siemens Healthcare, Forchheim, Germany). Patients were placed in a prone position and a needle guide fixed to a portable biopsy device (DynaTRIM) was introduced rectally (Invivo, Gainesville, FL, USA). T2-weighted axial and sagittal images were acquired with body coils. Image data were transferred to a DynaCAD workstation (Invivo) for biopsy planning. Two cores were taken of each lesion with an MRI-compatible, 18-gauge, fully automatic biopsy gun (Invivo).

Statistics

The data were tested for normal distribution using the Kolmogorov–Smirnov test. Normally distributed parameters were compared using the independent sample t-test, non-parametric data were tested using the Mann–Whitney U test. All data are expressed as mean ± SD. Statistical analysis was performed using IBM SPSS Statistics 19 for Windows (SPSS, Chicago, IL, USA). Statistical significance was defined at a P value below 0.05. The inter-reader agreement was calculated using Cohen’s kappa statistics. The inter-reader agreement was defined excellent (κ > 0.81), good (κ = 0.61–0.80), moderate (κ = 0.41–0.60), fair (κ = 0.21–0.40) and poor (κ ≤ 0.20) [14]. Sensitivity, specificity, positive and negative predictive values were calculated for the recommended cut-off score of ≥10 and, additionally, for a cut-off ≥9 using MRI-guided biopsy as the reference standard.

Results

Patients

In 56 lesions in 28 patients (42 %) MRI-guided biopsy confirmed prostate cancer. Seventeen lesions had a Gleason score of 6, 35 lesions a Gleason score of 7, 1 lesion a Gleason score of 8 and 3 lesions a Gleason score of 9. The mean age of all patients with verified prostate cancer was 69.6 ± 8.4 years compared with 65.1 ± 6.5 years in patients without cancer (P < 0.05). The mean prostate volume in patients with prostate cancer was 42.1 ± 11.5 ml and 67.6 ± 27.2 ml in patients without histologically verified prostate cancer (P < 0.01). PSA values were 11.2 ± 10.3 ng/ml in patients with and 8.7 ± 4.8 ng/ml in patients without verified prostate cancer (P = 0.183).

PI-RADS

The mean PI-RADS score of all lesions (n = 168) for all readers (n = 3) was 3.5 ± 1 for T2WI, 3.9 ± 0.9 for DWI and 2.7 ± 1.3 for DCE-MRI. Tumour lesions had a mean score of 4.2 ± 0.8 (T2WI), 4.5 ± 0.7 (DWI) and 3.5 ± 1.4 (DCE-MRI). Benign lesions had a mean score of 3.0 ± 0.8 (T2WI), 3.5 ± 0.7 (DWI), and 2.4 ± 1.1 (DCE-MRI). The mean overall PI-RADS score of tumour lesions and benign lesions was 12.3 ± 2.1 and 9.0 ± 1.6, respectively (Table 3). Data analysis considering the reference standard resulted in a sensitivity of mp-MRI for the detection of prostate cancer of 85.7 %, a specificity of 67.6 %, a positive predictive value of 57.8 % and a negative predictive value of 90.1 % when applying the recommended cut-off value of 10 points. A cut-off value of 9 points resulted in a sensitivity of 92.9 %, a specificity of 41.7 %, a positive predictive value of 45.2 % and a negative predictive value of 91.8 % (Table 4).

Table 3 Mean PI-RADS score ± SD shown for each MRI sequence with either cancer or benign lesions
Table 4 Accuracy of the PI-RADS score

Inter-reader agreement

Inter-reader agreement of all three readers was κ = 0.55 for T2WI, κ = 0.64 for DWI and κ = 0.65 for DCE-MRI. For malignant lesions kappa values were κ = 0.66 for T2WI, κ = 0.80 for DWI and κ = 0.63 for DCE-MRI. For benign lesions κ was 0.46 for T2WI, κ = 0.52 for DWI and κ = 0.67 for DCE-MRI using the PI-RADS score (Table 5).

Table 5 Inter-reader agreement of the PI-RADS score using kappa statistics evaluated by three blinded readers

Discussion

Based on unsatisfactory detection rates of clinically relevant prostate cancer by currently recommended diagnostic tools such as digital rectal examination, PSA and TRUS biopsy, using mp-MRI prostate cancer diagnostics can be significantly improved especially in patients with prior negative TRUS-guided biopsy [1521]. The recently published ESUR recommendation on mp-MRI of the prostate standardises all aspects of mp-MRI, including implementation, evaluation and documentation. This ESUR guideline includes a scoring system (PI-RADS) to evaluate prostate lesions on high-resolution T2-weighted images and at least two functional MR sequences [11]. Our study investigated the inter-reader agreement of the PI-RADS score. The results show that the ESUR score used by different radiologists leads to good to moderate inter-reader agreement and to a detection rate of 42 % in our patient population with elevated PSA and previously negative TRUS-guided biopsy.

Our mp-MRI protocol does not include spectroscopy (MRSI), which the ESUR guideline defines as “optional”. MRSI has been reported to be a valid additional tool to detect prostate cancer but extends the examination time. In addition, spectroscopy has not been reliably implemented at 3 T and the use of an endorectal coil at 1.5 T reduces patient comfort [22, 23].

Studies published before the release of the ESUR score demonstrated high sensitivities and negative predictive values for the detection of prostate cancer by using different scoring systems for lesion characterisation according to high-resolution T2WI, DWI and DCE-MRI [2426]. The PI-RADS score qualitatively evaluates lesions in the T2-weighted images according to the signal intensity separated by the peripheral zone (PZ) and the transition zone (TZ) with low signal appearance as a characteristic of malignancy. Well-defined lesions are assessed with a score of 2, whereas a score of 3 represents an intermediate, thus heterogeneous, appearance [11]. However, the inter-reader agreement for the T2-weighted images was only moderate. The main reason might be that characterisation of an area based on T2-weighted images alone is variable and subjective [27]. The DWI demonstrated better agreement between different readers. This is most likely due to evaluation of more than one parameter, namely a reduced apparent diffusion coefficient (ADC) in addition to a hyperintense signal on high b values (score 4 or 5). For DCE-MRI the scores are clearly defined by enhancement curves and, therefore, inter-reader agreement was better than for T2-weighted images [9, 28].

Focusing on histology, the inter-reader agreement was higher for malignant than for benign lesions. This clarifies the difficulty in determining and estimating benign lesions because of their various appearances. Recent studies show that tumours in the TZ are significantly less detected than tumours of the PZ [29]. They clearly suggest a different weight of the three MRI sequences, whereas DCE-MRI for the TZ obviously plays a minor part in a scoring system [30]. Nonetheless, the overall inter-reader agreement was good to moderate. An image atlas similar to the BI-RADS classification and also raising experience with the PI-RADS scoring system could further improve the inter-reader agreement.

Considering the threshold to be applied the ESUR guideline does not, as yet, provide any fixed threshold. A different study published a cut-off value of 9, whereas our data show better specificity and positive predictive value with only slightly lower sensitivity and negative predictive value when applying a cut-off value of 10. However, the accuracy data refer to in-bore MRI-guided biopsied data and therefore false-negative results may be present. Also, missing MRI biopsy correlation might be a limitation. These could only be excluded by histology from radical prostatectomy or a long follow-up. This lack of reference standard must be considered a limitation of this study. Nevertheless, the primary aim was to assess the inter-reader agreement of PI-RADS, because it is indispensable to use a uniform standardised score for the evaluation of mp-MRI of the prostate.

In conclusion, the PI-RADS score of the ESUR guideline shows good to moderate inter-reader agreement. The inter-reader agreement may be increased by a PI-RADS atlas with sample images, similar to the BI-RADS publications and growing experience with the PI-RADS score. Further studies have to prove whether a weighting of the MRI sequences should be implicated in the scoring system. With a standardised scoring system the evaluation of mp prostate MRIs results in a high sensitivity and negative predictive value using a cut-off value of 10.