Introduction

One of the main hallmarks of prostate cancer (PCa) management is to identify clinically significant tumors, which are defined as those with a Gleason score of 7 or more (either 4 + 3 or 3 + 4 with a prominent Gleason 4 component) and/or volume greater than 0.5 cm3 and/or extraprostatic extension [1,2,3]. Prostate multiparametric magnetic resonance imaging (mp-MRI) is useful to detect CSPCa (CSPCa) and can estimate tumor size and location, and improve biopsy accuracy [1, 2, 4]. Reports have shown that mp-MRI has a high negative predictive value (NPV) for CSPCa, mainly when associated with the Prostate Imaging Reporting and Data System version 2 (PI-RADSv2), what could be used to rule out significant disease [1, 2, 4].

Diffusion-weighted imaging (DWI) is an important part of the MRI evaluation of prostate tumors [5,6,7]. PCa presents water diffusion restriction and hypointensity on the apparent diffusion coefficient (ADC) map. DWI-ADC presents good correlations with the Gleason score, prostate-specific antigen (PSA) dosage, molecular markers, and clinical outcomes [6,7,8,9,10,11,12]. In the most recent version of the PI-RADSv2, DWI-ADC plays a central role in the evaluation of tumors located in the prostate peripheral zone, where 70–80% of PCa arise [3, 13]. For peripheral lesions, the PI-RADSv2 score is given based on a qualitative visual assessment of the lesion’s characteristics on the ADC map and at DWI with high b values (≥ 1400 s/mm2) [14]. However, this quantitative assessment is subjective, susceptible to inter-reader variability and associated with high discordance rates for category assignment. Agreement can be as low as 51% between highly experienced radiologists and 53% for moderately experienced readers [15, 16]. Also, the differentiation of PI-RADSv2 categories 3 from 4 might be difficult, mainly in the peripheral zone [16,17,18,19,20,21].

In these new PI-RADS guidelines, a quantitative approach based on the ADC values is acknowledged and a threshold of 750–900 µm2/s is suggested for the differentiation of benign and malignant lesions [14]. Recently, great performances have been reported for quantitative ADC to predict PI-RADSv2 categories 4 or 5, using ADC values thresholds of < 1061 µm2/s or ADC ratios thresholds of < 0.65 and < 0.77 [22, 23]. However, to our knowledge, there are no studies that tried to analyze the addition of quantitative ADC to the standard qualitative PI-RADSv2 scoring system. The aim of this study was to evaluate the impact of the addition of quantitative ADC value and ADC ratio into the diagnostic performance of the PI-RADSv2 scoring system to predict CSPCa.

Materials and methods

Study subjects

With the approval of our institutional review board, this retrospective study included 392 consecutive patients referred from the Urology Service of our institution to undergo 3-T prostate mp-MRI between March 2013 and March 2016. Patients were included whether histopathology samples were obtained from transrectal ultrasound-guided biopsy (TRUSGB), prostatectomy or transurethral resections of the prostate (TURPs). Exclusion criteria were: (a) histopathology study was performed elsewhere; (b) previous clinical history of therapy for pelvic neoplasm; (c) patients with hip prosthesis producing ferromagnetic artifacts; (d) patients who did not follow the standard mp-MRI protocol. Patients who had previously undergone TURP due to benign prostatic diseases were not excluded. In total, 91 patients were included in the study.

MRI and image assessment

The mp-MRI scans were done in a 3-T scanner (Signa HDxt, General Electric, Milwaukee, WI, USA) and an eight-channel surface coil was used. The protocol adapted to perform the prostate mp-MRI scans was done according to the recommendations from the PI-RADSv2, described elsewhere [14]. Additionally, diffusion sequences with high b values (b = 2000 mm2/s) were included.

Images were processed (GE Medical Systems S.C.S., model number 5394794-2, AW VolumeShare 5, software AW4.6) and analyzed by two radiologists (9 and 2 years of experience with MRI) who were blinded to the clinical data and received training in the new PI-RADSv2 scoring guidelines.

Each lesion was assigned one of the following PI-RADSv2 scores: categories 1 and 2 (respectively, very low and low probability of CSPCa), category 3 (unpredictable behavior of tumor), and categories 4 and 5 (respectively, high and very high probability of CSPCa). In case of reader disagreement, consensus on PI-RADSv2 score was reached. Such consensus was used for subsequent statistical assessments.

The ADC values were measured using a region of interest (ROI) of approximately 14 mm2 placed in the suspected lesion, avoiding borders, and using a field of view (FOV) adjusted to prostate imaging. For larger lesions, the ADC values were obtained in the zones of lowest signal intensity. Contralateral ADC value was also measured using a similar ROI with the same dimensions, at the same slice, and placed on an area of normal prostate parenchyma in all mp-MRI sequences. ADC ratio was calculated by the division of the tumor ADC value by the contralateral ADC value.

Reference standard

Histopathology analyses of the prostate samples were performed by an experienced urologic pathologist according to the criteria by the International Society of Urological Pathology (ISUP) [24]. Following the National Comprehensive Cancer Network guidelines, non-targeted TRUSGB were performed by an experienced radiologist, collecting 12 or more prostate cores, and the prostatectomies and TURPs were performed by the Urology team [25]. Using the histopathological data as the gold standard, the presence of CSPCa was defined as a Gleason score ≥ 7 (4 + 3 or 3 + 4) in > 2 positive cores and/or > 50% core involvement in the biopsy samples or Gleason score ≥ 7 (4 + 3 or 3 + 4) in prostatectomy and TURP samples [26, 27].

Statistical assessment

Quantitative variables were expressed as mean and standard deviation, whereas qualitative variables were expressed as absolute and relative frequencies. Weighted Kappa coefficient with a 95% confidence interval (95% CI) was used to assess inter- and intra-reader agreement when PI-RADSv2 scores were divided into three groups: PI-RADSv2 categories 1–2, category 3, and categories 4–5). The association between the PI-RADSv2 scores and the presence of CSPCa was evaluated using Pearson’s Chi-Square test. The Mann–Whitney U test was used to assess the association between ADC value or ADC ratio and the presence of CSPCa.

For the diagnostic performance of the PI-RADSv2 score categories and the ADC variables, we calculated the sensitivity, specificity, positive and negative predictive values (PPV, NPV), positive and negative likelihood ratios (PLR, NLR), and diagnostic odds ratio (DOR). Receiver operating characteristic (ROC) curves were also constructed and the area under the curve (AUC) was obtained. We also analyzed the diagnostic performances of PIRADSv2 scores and ADC data separately for the samples obtained from prostatectomy and from TRUGB.

A Poisson regression multivariate analysis was performed to control confounding factors. Hazard ratio (HR) and 95% CI were used to calculate the measure of effect. A p value of ≤ 0.05 was considered as statistically significant. All statistical analyses were performed using SPSS version 21.0.

Results

The subjects’ baseline data and inclusion are summarized in Table 1 and Fig. 1, respectively. Mean age was 64.4 ± 8.2 years. Most histopathological specimens were obtained with TRUSGB (n = 44, 48.4%) or prostatectomy (n = 39, 42.9%). The interval time between the MRI scan and the prostate sample acquisition was considerably longer for TURP (median 8.1 months), compared to TRUSGB (median 1.6 months) and prostatectomy (median 2.3 months).

Table 1 Baseline characteristics
Fig. 1
figure 1

Patient flowchart. mp-MRI multiparametric magnetic resonance imaging, PI-RADSv2 Prostate Imaging Reporting and Data System version 2.0

Most patients did not present CSPCa (n = 53, 58.2%) and had a Gleason score < 7 (n = 58, 63.7%). CSPCa was found in 41.8% of cases (n = 38). Disagreements between the histopathology analyses from TRUSGB and prostatectomy samples were found in three cases, and the radical prostatectomy results were used for statistical tests in these cases.

There was a good overall agreement between the readers either in analyses using the division of the PI-RADS score in three subgroups (73.6%, weighted kappa 0.56) or in two subgroups (82.4%, weighted kappa 0.55). Intra-reader overall agreement varied between 77.8% (reader 2, non-weighted kappa 0.59) and 88.9% (reader 1, non-weighted kappa 0.85).

In the first PI-RADS scoring analysis, both readers achieved good sensitivities (reader 1, 100%; reader 2, 94.7%) and low specificities (reader 1, 37.7%; reader 2, 49.1%) for the detection of CSPCa (Table 2). After consensus, PI-RADSv2 score 3–5 presented a sensitivity of 97.4% (95% CI 86.5–99.5) and specificity of 50.9% (37.9–63.9), while PI-RADSv2 score 4–5 yielded a sensitivity of 94.7% (82.7–98.5) and a specificity of 67.9% (54.5–78.9) (Table 2). Overall AUC of PI-RADSv2 score to detect CSPCa was 0.84 (0.76–0.92) (Fig. 2). Separately per categories, AUC was 0.74 (0.67–0.81) for PI-RADS 3–5 and 0.81 (0.74–0.88) for PI-RADS score 4–5 (Fig. 2). PPV, NPV, PLR, and NLR are described in Table 3.

Table 2 Correlation between PI-RADSv2 score and clinically significant prostate cancer
Fig. 2
figure 2

ROC curves of PI-RADSv2 scores and ADC data to diagnose clinically significant prostate cancer

Table 3 Diagnostic performances of PI-RADSv2 score and quantitative ADC data to predict clinically significant prostate cancer

Median ADC value of lesions presenting CSPCa was 750 µm2/s [interquartile range (IQR) 640–790], whereas lesions without CSPCa had a median ADC value of 880 µm2/s (IQR 790–990) (p < 0.001). Median ADC ratio for the presence or absence of CSPCa was 0.56 (IQR 0.49–0.65) and 0.68 (0.61–0.78), respectively (p < 0.001). Figure 3 summarizes in box-and-whiskers plots the differences on ADC data for significant prostate lesions. The ROC curve analyses revealed an ADC value < 750 µm2/s and an ADC ratio < 0.62 as the most accurate thresholds for differentiation of CSPCa, with AUC of 0.81 (0.71–0.91) and 0.76 (0.64–0.87), respectively (Fig. 3). The PPV, NPV, PLR, and NLR for these variables are described in Table 3.

Fig. 3
figure 3

Box-and-whiskers plots demonstrate significant differences of median ADC value and median ADC ratio between clinically significant prostate cancer and non-clinically significant prostate cancer

When using PI-RADSv2 score 3–5 or 4–5 plus an ADC < 750 µm2/s or an ADC ratio < 0.62, there was an increase in the specificity and PLR to predict CSPCa compared to the performance of these variables alone (Table 3). Combining PI-RADS score 4–5 and ADC value < 750 µm2/s increased specificity to 88.7 (77.4–94.7) and PLR to 6.21 (2.84–13.6), while using both PI-RADS score 4–5 and ADC ratio < 0.62 yielded a specificity of 88.5 (77.0–94.6) and PLR of 5.62 (2.55–12.4). However, such combination of variables resulted in lower sensitivity and NLR. Sensitivity was 70.3 (54.2–82.5) and NPV was 0.34 (0.20–0.56) for PI-RADS score 4–5 and ADC value < 750 µm2/s, whereas sensitivity was 64.9 (48.8–78.2) and NLR was 0.40 (0.25–0.62) both PI-RADS score 4–5 and ADC ratio < 0.62 (Table 3). Changes in the AUC were not of significance, with an AUC of 0.79 (0.71–0.88) for PI-RADS score 4–5 plus ADC value < 750 µm2/s, and 0.76 (0.67–0.85) for PI-RADS score 4–5 plus ADC ratio < 0.62.

In the secondary analysis separating the samples obtained from prostatectomy and TRUGB, the addition of quantitative ADC parameters into PIRADSv2 scores similarly resulted in higher specificities and PLRs with lower sensitivities and NLRs (Tables 4, 5). The addition of ADC value < 750 µm2/s and ADC ratio < 0.62 resulted in an increase of 35.3% and 34.6%, respectively, in the specificity of PIRADS score 4–5 obtained from prostatectomy (Table 4) and in 10.7% and 10.7%, respectively, in the specificity of PIRADS score 4–5 obtained from TRUGB (Table 5).

Table 4 Diagnostic performances of PI-RADSv2 score and quantitative ADC data to predict clinically significant prostate cancer—samples obtained from prostatectomy (n = 39)
Table 5 Diagnostic performances of PI-RADSv2 score and quantitative ADC data to predict clinically significant prostate cancer—samples obtained from TRUGB (n = 44)

In the multivariate analysis after adjusting for confounding factors, CSPCa was associated with a PI-RADS score 4–5 [relative risk (RR) = 7.34; 95% CI 1.10–48.9; p = 0.039], an ADC value of < 750 µm2/s (RR = 1.94; 1.08–3.50; p = 0.027), and an ADC ratio of < 0.62 (RR = 1.70; 1.03–2.81; p = 0.038). PI-RADSv2 score 3 was not associated with CSPCa (RR = 1.39; 0.11–18.3; p = 0.804) (Table 6).

Table 6 Multivariate analysis of factors independently associated with clinically significant prostate cancer

Discussion

In this study, we tested the effects of additional ADC-derived criteria on the performance of the PI-RADSv2 scoring system to detect CSPCa. When using a PI-RADS score ≥ 3 plus an ADC < 750 µm2/s or an ADC ratio < 0.62, there was an increase in the specificity and PLR to predict CSPCa compared to the performance of these variables alone. On the other hand, such combination of variables resulted in lower sensitivity and NLR. These results suggest that quantitative ADC data might not be beneficial to be used routinely in mp-MR imaging as criteria to detect clinically significant lesions due to the reduced sensitivity. Instead, when prostate lesions present a PI-RADSv2 score ≥ 3, additional quantitative ADC criteria can be helpful to increase the PI-RADS score specificity. Such strategy could be helpful in false-positive results, such as benign lesions with mildly restricted diffusion at DWI, e.g., prostatitis [3, 14].

Likelihood ratios are clinically more meaningful than sensitivity and specificity to characterize clinical diagnostic tests as they predict the probability of having, or not having, a disease. PI-RADSv2 score categories 4–5 presented a PLR of 2.95 (1.98–4.40) what slightly increased to 6.21 (2.84–13.6) when combined with ADC < 750 µm2/s or to 5.62 (2.55–12.4) when combined with ADC ratio < 0.62.

Using a PI-RADSv2 score > 3, the yielded sensitivity was 94.7 and NPV was 94.7, findings comparable to previous reports. In a study using a PI-RADSv2 score ≥ 3 as the cut-off point, sensitivity and NPV were 85% and 84%, respectively, but higher values for specificity and PPV when compared to the present study [4]. Ahmed and the PROMIS study group correlated prostate mp-MRI with a template prostate mapping biopsy and found sensitivity of 87% and NPV of 72%. They used a different Likert-like scale from the PI-RADSv2, and Gleason score ≥ 3 + 4 or core length involvement ≥ 4 mm was considered as clinically significant. Mainly due to the number of patients (n = 576) and its prospective design, the findings of the PROMIS study group are relevant to reinforce the use of prostate mp-MRI as a screening method for clinically significant cancer [28].

One of the main drawbacks of the current version of PI-RADS scoring system is the high inter-reader variability inherent to qualitative evaluations. Studies have reported agreement rates as low as 51% even between highly experienced radiologists [15, 16]. We found moderate agreement between the readers in analyses using either the division of the PI-RADS score in three subgroups (categories 1–2 vs. 3 vs. 4–5: 73.6%, weighted kappa, 0.56) or in two subgroups (categories 1–3 vs. 4–5: 82.4%, weighted kappa 0.55). These findings are consistent with those from previous studies, varying between 0.47 and 0.593 [15, 16, 29, 30]. Regarding the intra-reader agreement, the difference between coefficients (a weighted kappa of 0.85 for the senior reader 1 and 0.59 for the reader 2) possibly reflects the length of experience of each reader.

Quantitative measurements could help to reduce such high inter-reader variability of the current version of PI-RADS system. Our study highlights the correlation between ADC value or ADC ratio and the presence of clinically significant cancer. Henderson et al. [31] also showed an association between low ADC values and clinically significant cancer in patients undergoing active surveillance, suggesting that a re-biopsy should be avoided in patients with ADC values out of the spectrum of aggressive cancer. Gaur et al. reported that ADC values correlated inversely with both PI-RADSv2 categories and histopathologic categorization [22].

Although a quantitative approach was not included in the current PI-RADSv2 scoring system, the guidelines acknowledge that a threshold of 750–900 µm2/s can be used in the differentiation of benign and malignant lesions [14]. ADC values < 1061 µm2/s and ≤ 800 µm2/s were reported by Gaur et al. and by Jordan et al., respectively, as accurate thresholds to improve PI-RADSv2 diagnostic accuracy [22, 32]. In our study, ADC < 795 µm2/s was the most accurate threshold to detect CSPCa. However, ADC values are also subject to variability, and these threshold differences can be due to several technical factors, such as variances in manufacturers, MRI field strengths, and DWI acquisition parameters. An alternative to decrease variability in ADC value measurement and improve reproducibility is the use of ADC ratio [33,34,35]. As the ADC values of both PCa and benign lesions will change proportionally across different MR protocols or vendors, adoption of ADC ratio could minimize variances and allow the use of a universal cut-off. The optimal ADC ratio < 0.62 found in our study is comparable to the ADC ratio threshold of < 0.65 reported by Gaur et al., despite the different thresholds of ADC values (respectively, 795 µm2/s vs. 1061 µm2/s).

Our study has several limitations. First, we included histology samples obtained from either TRUSGB, TURP or prostatectomy. Although this strategy allowed us to include more patients, biopsies were not targeted; hence, their use as a reference standard for the histopathology and MRI correlation might be limited. For this reason, we have also included a secondary analysis separating the samples obtained from prostatectomy and TRUGB, and the effects of the addition of ADC variables on PIRADSv2 scores were similar. Second, we could not differentiate lesions from the peripheral or transitional zones. ADC value and ADC ratio vary according to these zones [14], what might have limited our results. Further studies should test our results separating the lesions according to their prostate zone location. Third, the study was retrospective, and the sample size was small. Further larger prospective studies should test our results. Fourth, there was a 7-year difference of experience between the readers. Despite such difference, they achieved substantial results in inter- and intra-reader agreement in PI-RADSv2, as well as in the correlation with histopathology findings. Also, sample acquisition was considerably longer for TURP due to the routine agenda limitations, which could have influenced the results.

In summary, our results suggest that quantitative ADC may not be beneficial to be used routinely in mp-MR imaging as criteria to detect clinically significant lesions due to the reduced sensitivity. Instead, when prostate lesions present a PI-RADSv2 score ≥ 3, additional quantitative ADC criteria can be helpful to increase the PI-RADS score specificity and PLR.