Introduction

In the United States, as in many developed regions around the world, the mortality to incidence ratio of prostate cancer is ~15%.1 Radical prostatectomy or whole-gland irradiation are considered the gold standard for the cure of localized prostate cancer. However, the widespread use of these treatments expose a large majority of men who have non-lethal prostate cancer to unnecessary treatment-related morbidity.2 Concerns regarding overtreatment have led to a greater interest in patient management with active surveillance (AS) rather than radical intervention.

Contemporary AS protocols have focused on identifying men with indolent cancer that are thought not to impact lifespan. In the Hopkins cohort, one of the two longest running AS series, the Epstein criteria for very low-risk prostate cancer aimed to predict a pathologically insignificant cancer of Gleason 3+3 or prognostic grade group (PGG) 1 <0.2 ml in size.3 In the Toronto cohort, more inclusive criteria allowing men with low-volume Gleason 3+4 were used.4 In these series, the metastatic cancer rate was 0.4% and 2.8%, and the cancer mortality rate 0.15% and 1.5%, respectively, likely reflecting the differences in inclusion criteria.5, 6 Given the long natural history of prostate cancer, it is important to balance the role of AS with life expectancy. Strict criteria may be helpful in selecting younger men with a longer horizon of follow-up, while more inclusive criteria may be acceptable in older men or those with comorbidities and a shorter horizon of follow-up.

The major limitation of patient selection for AS is undergrading and understaging of prostate cancer due to undersampling with standard transrectal ultrasound (TRUS) biopsy techniques, causing aggressive cancer to be managed initially as an indolent disease.7 Only half of the men on AS remain so at 10 years and those eventually receiving treatment did so mainly for reasons of cancer progression.5 In addition, up to half the men who discontinued AS for treatment with curative intent had subsequent biochemical recurrence.8 A recent 10-year analysis of the Prostate Testing for Cancer and Treatment (ProtecT) trial showed that men randomized to AS were more likely to develop cancer progression and metastatic disease compared to those randomized to radical intervention.9

Multiparametric magnetic resonance imaging (mpMRI) is thought to better grade and stage prostate cancer, preferentially detecting high grade rather than indolent tumors.10, 11 However, mpMRI comes at a significant additional cost with an, as yet, unclear marginal utility. Herein, we aim to evaluate the incremental value of mpMRI over traditional clinical tools in selecting patients for AS using different criteria.

Materials and methods

While our study predated the recommendation of the Prostate Cancer Radiological Estimation of Change in Sequential Evaluation (PRECISE) guidelines, we used PRECISE to guide reporting in this paper.12

Patients

We obtained Institutional Review Board approval with waiver of consent to review all men undergoing radical prostatectomy for clinically localized prostate cancer and mpMRI at our tertiary academic institution from 2011 to 2014. All men with complete clinical data for biopsy Gleason score, biopsy total/positive cores, serum PSA levels, clinical stage and prostate volume were included for analysis. Men with prior alternative treatments for prostate cancer such as ablative, radiation or hormonal therapies were excluded on the basis that mpMRI interpretation would be affected.

mpMRI technique and interpretation

The mpMRI technique used was similar to that in our previous publications.13 In brief, one of the two 3.0 Tesla MR scanners (General Electric HDx, GE Healthcare, Waukesha, WI, USA; Siemens Skyra, Siemens Healthcare, Erlangan, Germany) using a single channel Medrad eCoil endorectal coil (Medrad, Indianola, PA, USA), as well as multichannel surface coils was used. Imaging sequences included thin-section (3 mm section thickness) fast spin echo T2-weighted images in the coronal, axial and sagittal planes. Diffusion weighted images were obtained using multiple b-values (b=0.800), and apparent diffusion coefficient (ADC) maps were calculated. Dynamic contrast-enhanced MR sequences were obtained after administration of a weight-based dose of extracellular MR contrast agent (Magnevist, Bayer Pharma, Leverkusen, Germany) with 4–5 s temporal resolution for 5–6 min.

mpMRI was read by one dedicated radiologist reader with a special interest in prostate mpMRI, and the tumor volumes, mpMRI stage and presence of extracapsular extension was determined. Version 2 of the Prostate Imaging—Reporting and Data System (PIRADS) was used to identify suspicious lesions on mpMRI.14 Each lesion that was PIRADS score 3 or greater were further scored for the likely cancer grade using an ADC value of 1000 × 10−6 mm2per seond as a cutoff.15, 16, 17 First, the lesion region of interest (ROI) was identified and the mean ADC was determined. Where the mean ADC was greater than 1000 × 10−6 mm2 per seond, the lesion was assigned PGG 1 (Gleason 3+3) by mpMRI. Where the mean ADC was less than 1000 × 10−6 mm2per seond, it was considered likely that the lesion contained Gleason grade 4–5 cancer (PGG 2–5). A sub-ROI was then created in the most diffusion-restricted portion of the lesion. When this sub-ROI was <50% of the entire lesion ROI, an mpMRI PGG 2 (Gleason 3+4) score was assigned. When the sub-ROI was >50%, the assigned mpMRI PGG score of 3 (Gleason 4+3) or greater. A lower ADC cutoff of 800 × 10−6 mm2 per second was used for transition and central zone tumors in view of the generally lower ADC values associated with tumors in these locations.

Radical prostatectomy and pathological examination

Radical prostatectomy was performed using open or robotic approaches. The method of pathological examination of the prostate is detailed in a previous Duke Prostate Center Database publication.18 In brief, following removal, each prostate was weighed and inked with a different color on each side. The specimen was then fixed in formaldehyde and refrigerated overnight at 4 °C. The apex and bladder neck margin were shaved and sectioned radially to assess margins parallel to the urethra. The remaining prostate tissue was sectioned in cuts of 3–4 mm perpendicular to the surface plane of the rectum and then placed on up to 40 blocks for microscopic evaluation. The relevant details available in the final pathological report included primary, secondary and tertiary Gleason grades, proportion of gland involved by tumor, tumor location, stage, and presence and location of extracapsular extension.

Clinical, mpMRI and pathological definitions for AS

We planned to evaluate the incremental value of mpMRI in augmenting three different criteria for AS. The clinical criteria used were (1) the Epstein criteria, (2) National Comprehensive Cancer Network low-risk and (3) extended criteria including up to Gleason 3+4 (PGG 2), PSA ⩽15 ng/ml and clinical stage ⩽T2b.

There currently is no true pathological definition for suitability for AS. The intent of the Epstein criteria was to identify men with insignificant prostate cancer, defined as a Gleason 3+3 cancer <0.2 ml and this was adopted as the pathological outcome correlate for criteria (1).3 The Epstein criteria, though initially formulated based on six-core biopsy, have been validated in extended biopsy cohorts.19 The pathological definition for criteria (2) represents the National Comprehensive Cancer Network low-risk definition for prostate cancer for which AS is a recognized treatment option.20 The pathological definitions for criteria (3) was adopted based on the initial report by Choo et al.4 for the Toronto AS series including men with up to T2b Gleason 3+4 (PGG 2) cancer. In the pathological analysis for pT2b, contralateral lesions were allowed as long as they were insignificant (<0.2 ml in size).

The respective mpMRI criteria were formulated to represent the intentions of the clinical criteria in predicting pathological outcome and are summarized in Table 1. For lesion size determination in criterion (1), lesion sizes were aggregated in cases where more than one lesion was present.

Table 1 Definitions of active surveillance

Outcomes and statistical analysis

The main outcome measure was to predict pathological grade, volume and stage combinations for each respective AS criterion as summarized in Table 1. Statistical analysis was performed using Stata 14.0 (College Station, TX, USA). Basic demographic variables were summarized with counts, frequencies and standard measures of central tendency. Performance characteristics (sensitivity, specificity, positive and negative likelihood ratio (LR), and area under receiver operating curve) of clinical criteria and mpMRI for predicting suitability for AS on final pathology were separately determined. Subsequently, clinical criteria and mpMRI criteria were combined such that patients would be determined suitable for AS only if they fulfilled both criteria. The incremental value of mpMRI was then determined by comparing the receiver operating curves, and sensitivity and specificity using the McNemar test/exact binomial sign test, of the combined criteria to the clinical criteria in predicting suitability for AS on final pathology.

Results

In total, we included 208 men with a mean age of 61.9 years (s.d. 6.8 years) who had undergone radical prostatectomy after mpMRI of the prostate. Majority of the mpMRI studies were performed for staging prior to prostatectomy with a minority performed prior to biopsy (11.5%). The median time from mpMRI to radical prostatectomy was 0.69 months (IQR: 0.43–2.50). A summary of the clinical, mpMRI and pathological data can be found in Table 2.

Table 2 Demographics

Only one man fulfilled criteria (1) (Epstein) at pathology, and he was neither identified using clinical criteria (six men identified) nor mpMRI (eight men identified). Using criteria (2) (National Comprehensive Cancer Network low-risk), five men qualified at final pathology, while clinical criteria identified 54 men with a sensitivity of 80%, specificity 75% and AUC 0.78. Combined clinical-mpMRI criteria identified five men with a sensitivity of 80%, specificity 99.5% and AUC 0.90. The improvements in specificity and AUC were both statistically significant (Table 3). The addition of mpMRI increased the positive LR from 3.25 (95% confidence interval (CI) 1.97–5.35) to 162 (95% CI 21.9–1204).

Table 3 Criteria (2)—NCCN low-risk

Using criteria (3) (extended), 19 men qualified at pathology, while clinical criteria identified 114 with a sensitivity of 74%, specificity 47% and AUC 0.60. Addition of mpMRI information restricted the qualifying men to 11 with a sensitivity of 26%, specificity 97% and AUC 0.62. Here, statistically significant improvements in specificity and degradation in sensitivity were seen, while the overall AUC was not changed (Table 4). The positive LR improved from 1.39 (95% CI 1.03–1.88) to 8.29 (2.79–24.6), while the negative LR deteriorated from 0.60 (95% CI 0.26–1.2) to 0.76 (95% CI 0.58–0.99).

Table 4 Criteria (3)—low-volume 3+4 (PGG 2)

Of men who were determined to have PGG 1 cancer at either TRUS biopsy or mpMRI, 50% eventually upgraded at final pathology. However, when determined to have PGG 1 cancer using a combination of TRUS biopsy and mpMRI, only 20% eventually upgraded. On the other hand, of men who were determined to have PGG 2 cancer at TRUS biopsy or mpMRI, 21.8% and 34.0% upgraded respectively, but of men who were determined to be PGG 2 using a combination of both modalities, the upgrading rate was minimally impacted at 19.4% (Table 5).

Table 5 Gleason score/PGG determined by TRUS Biopsy, mpMRI and combined

Discussion

Within our data set, clinical criteria had a high sensitivity (identifying suitable patients) and moderate specificity (identifying unsuitable patients) in identifying men for AS. With the added information from mpMRI, specificity was significantly improved at the expense of sensitivity. The positive LR tells us the degree to which the test can update the prior odds of being suitable for AS, with the test having a moderate impact at 5–10 and a high impact at >10.21 The positive LR increased from 3 to 60 and 2 to 11 with the addition of mpMRI to clinical information using AS criteria (2) and (3), respectively. The negative LR represents the degree to which the test can update the prior odds of not being suitable for AS, with the test having a moderate impact at 0.1–0.2 and high impact at <0.1. Using this as a measure, the addition of mpMRI to clinical criteria generally resulted in the test deteriorating from moderate to low impact. These observations show that if mpMRI, in addition to clinical criteria, determines that a man is suitable for AS, the probability that he is so at pathology is high. On the other hand, if mpMRI, in addition to clinical criteria determines a man to be not suitable for AS, the probability that he is not suitable at pathology is low. In the case of criterion (3) (extended), this probability is poorer than if clinical criteria alone were used.

mpMRI is thought to preferentially detect high-grade cancer because tumors with greater cell density restricted diffusion and those that are more vascular show contrast enhancement. Da Rosa et al.,22 in 72 men undergoing MR-guided fusion biopsy, found that mpMRI had a 100% negative predictive value for Gleason ⩾7 tumors. Similarly, in our series, mpMRI was excellent at identifying men not suitable for AS. However, this would have been at the expense of many men being subjected to treatment even though they might be suitable for AS. ADC values have been closely correlated with low or high Gleason grades.23 In men on AS, baseline ADC values have also been found to predict time to radical treatment and time to adverse histology.24 Using ADC values, our dedicated radiologist attempted to grade all lesions seen on prostate mpMRI into PGG 1, 2 and 3 or higher. This was done so in addition to the PIRADS v2 grading system, which only aims to estimate the probability of clinically significant cancer. On the basis of data from other mpMRI studies, we adopted an ADC cutoff of 1000 × 10−6 mm2per second for definition of PGG 1 versus PGG 2–5.15, 16, 17 Further, we used a 50% ROI distribution of ADC <1000 × 10−6 mm2per second as a cutoff to predict PGG 2 versus 3 or more. Ultimately, mpMRI correctly identified only 20% of the PGG 1 cancers that were present at final pathology, overgrading the others as PGG 2 or more (Supplementary Table 1). This overgrading is likely to be the cause of poorer sensitivity and negative LR when mpMRI information was included.

Porpiglia et al.25 carried out a retrospective study of 126 men suitable for AS using the Prostate Cancer Research International: AS (PRIAS) criteria undergoing radical prostatectomy. Using a logistic regression approach, they found that mpMRI increased model AUC by 0.07 and 0.05 for a base clinical model using Epstein and PRIAS criteria, respectively. We eschewed such an approach because our intent was to determine the added value of mpMRI in a clinical setting, where clinicians use set criteria for decision-making rather than a regression algorithm.

Our study does have several limitations with which it should be interpreted. First, apart from Epstein’s intent of identifying a PGG 1 focus <0.2 ml as insignificant cancer, there is no evidence-based definition of what is considered suitable for AS at final pathology. We derived the pathological definitions for AS criteria (2) and (3) based on clinical criteria. We were also not able to exhaustively test all AS criteria that are in use. However, we believe that the criteria that we have chosen represent a reasonable range of criteria that can be used for men with an appropriate life expectancy. Second, our dedicated radiologist attempted to differentiate PGG 1, 2 and 3, or more on mpMRI. Our use of ADC cutoff of 1000 × 10−6 mm2 per second, and additionally the use of 50% distribution of restricted diffusion within the lesion ROI to differentiate PGG 2 and 3 or more, is not a standardized practice. We are aware that advanced ADC metrics such as 10th and 25th percentiles, kurtosis, skewness and entropy may lend a better predictive power for cancer grading compared to mean ADC alone.26, 27 We adopted our current approach because we wanted to use best available metrics applicable to a real-life non-academic setting, where there is often no access to specialized software necessary to generate advanced ADC metrics, to improve the generalizability of our findings. Finally, our cohort is a highly-enriched population of men undergoing radical prostatectomy who were not selected a priori as candidates for AS. Furthermore, not all men undergoing radical prostatectomy at our institution underwent mpMRI prior to surgery. As such, while the availability of final pathology as a reference test is considered the gold standard for evaluation of a diagnostic test, caution should be exercised in generalizing our findings to the larger population of men with screen-detected cancers.

We have shown that using mpMRI to complement clinical criteria for AS improves selection performance mainly in specificity and positive LR at the cost of sensitivity and negative LR. As less men selected for AS with mpMRI complementation are unsuitable for surveillance at pathology, there may be better confidence in physicians and patients in embarking on surveillance, less anxiety while on surveillance, and less probability to cancer progression or metastasis. These downstream effects of better classification remain to be elucidated.

Conclusion

Addition of mpMRI significantly improved selection of men for AS using National Comprehensive Cancer Network low-risk criteria, but only improved specificity at the expense of sensitivity when using an extended criterion including low-volume PGG 2. In the latter group, it may improve confidence in the suitability of men selected for AS, but result in overtreatment of other men.