Introduction

The Gleason score is the most commonly used pathological grading system in prostate carcinoma. It has been more than 40 years since its inception [1], yet it remains one of the most powerful prognostic factors in prostate carcinoma [2, 3]. It is central to stratifying patients into risk groups and in determining the management in patients with prostate carcinoma [4, 5].

Transrectal ultrasound-guided (TRUS) biopsies are commonly used to determine the Gleason score prior to definitive management [6]. TRUS biopsies were shown to be acceptably accurate [7] in predicting the Gleason score on prostatectomy specimens and therefore have become part of the routine work-up of men with suspected prostate cancer [4, 5].

There are several shortcomings to TRUS biopsy, however. Firstly, while it is acceptably accurate, it should be recognised that there may still be discrepancies in 25–30 % of cases [7]. Secondly, while TRUS biopsy is generally accepted to have a good safety profile, it is still an invasive procedure, with significant complications reported in up to 6 % of patients [8, 9]. Thirdly, up to 15–31 % of patients may need repeated biopsies simply because the initial TRUS biopsy missed the regions that contained carcinoma [10]. Patients undergoing active surveillance also need repeated biopsies [4].

Novel imaging techniques have been proposed for either augmenting or supplanting TRUS biopsies for prognostication in patients with prostate carcinoma. These imaging techniques include diffusion-weighted (DW) MRI [1113], magnetic resonance spectroscopy (MRS) [14], dynamic contrast enhanced (DCE) MRI [15], 11C-choline (CHOL) PET [16, 17], and 18F-fluorodeoxyglucose (FDG) PET [18]. This study investigates three of these novel imaging techniques: DW MRI, CHOL PET, and FDG PET. The biological mechanisms of how these imaging techniques differentiate between aggressive pathological conditions and less aggressive ones have been extensively investigated.

DW MRI obtains images that are dependent upon the random movement of water molecules (Brownian motion) in the imaged tissues [19]. This reflects the diffusion of water in the interstitial space, providing information on the biophysical properties of the tissue, including tissue architecture and cell density. The diffusion properties of the tissue examined can be quantified by calculating the apparent diffusion coefficient (ADC), which has been found to correlate significantly with cell density [19, 20]. The ADC declines as the glandular architecture is replaced more and more by tightly packed cancer cells, which is the case with higher Gleason score cancers.

CHOL PET obtains images that are dependent upon the accumulation of CHOL in cells [21]. CHOL is a radiotracer based on choline, an essential component of the cell membrane. Choline is taken up into cells by the choline transport system and then phosphorylated by choline kinase to phosphorylcholine. Prostate cancer cells have a high uptake of choline into cells, possibly owing to a higher proliferation rate. CHOL PET standardised uptake values (SUV) have been found to correlate significantly with several immunohistochemical markers of malignancy and aggressiveness, including choline kinase α expression [22] and MIB-1/Ki-67 labelling index [16].

FDG PET allows the assessment of the metabolic state of malignant lesions by imaging the accumulation of FDG into tumour cells [23]. FDG is a glucose derivative where the hydroxyl function in position 2 is replaced by a radioactive fluorine isotope. FDG is taken up by glucose transporters into the cell and phosphorylated via hexokinase. Because of the missing hydroxyl function, further metabolism is not possible, and because of the negative charge, the phosphorylated FDG cannot cross the cell membrane, leading to trapping in the cell. FDG PET SUV has been found to be correlated with glucose transporter expression, which is correlated with the Gleason score [18].

The fact that imaging is non-invasive provides it with important advantages over TRUS biopsy. While many prior studies have shown novel imaging techniques to have good correlations with the Gleason score, few have shown that imaging is as good as the current standard of TRUS biopsy. Our study aims to investigate the accuracy of DW MRI, CHOL PET, and FDG PET, and compares them to the current standard of TRUS biopsy.

Materials and methods

Study design and patient population

This is an analysis of data from a prospective single-institution study of 30 patients recruited between September 2008 and March 2011 with histopathologically proven adenocarcinoma of the prostate with intermediate to very high-risk disease [4]. All patients provided written informed consent, and the study was approved by the institutional ethics committee. All patients underwent standard diagnostic and staging investigations before recruitment, including serum prostate specific antigen (PSA), TRUS biopsy, whole-body skeletal scintigraphy, and CT. None of the patients had any treatment for prostate cancer before recruitment into this study. Eligibility criteria included expected survival longer than 3 months, Karnofsky performance status ≥70, no known nodal or distant metastases, and no contraindications to PET or MRI. All patients underwent pelvic MRI, pelvic CHOL PET/CT, and whole-body FDG PET/CT at least 2 weeks after the TRUS biopsy. DW MRI was not mandated in the original trial protocol, but was also performed with the standard MRI sequences in 24 patients. The patients then underwent radical prostatectomy within 4 weeks of MRI and PET. Patients who did not have DW MRI or who had significant imaging artefact that compromised their interpretation were excluded from the final analysis.

The pathologist’s interpretation of the prostatectomy specimen was used as the reference standard. Five index tests were evaluated: TRUS biopsy, DW MRI, CHOL PET, FDG PET, and combined DW MRI/CHOL PET.

Imaging acquisition protocols

Synthesis of CHOL and FDG was performed on-site, using an in-house cyclotron (Ion Beam Applications SA, Belgium) and radiochemistry laboratories [24]. CHOL and FDG PET/CT imaging protocols have previously been described [21, 25]. CHOL and FDG PET images had fields of view (FOV) of 576 mm × 576 mm and voxel sizes of 4 mm × 4 mm × 4 mm.

Magnetic resonance imaging was performed on a 3-Tesla Magnetom Verio system (Siemens, Erlangen, Germany) with an external 32-channel body-array coil. Sequences that were analysed in this study include multiplanar 2D T2W turbo spin-echo (TSE), 3D T2W sampling perfection with application-optimised contrasts using different flip angle evolutions (SPACE), and DW sequences. The following parameters were used in the TSE sequences: repetition time (TR) 4,200 ms, echo time (TE) 89 ms, FOV 170 mm × 170 mm, and voxel size 1.9 mm × 1.9 mm × 4.4 mm. The following parameters were used in the SPACE sequence: TR 1,700 ms, TE 102 ms, FOV 200 mm × 200 mm, and voxel size 1.3 mm × 1.3 mm × 1 mm. The following parameters were used in the DW MRI sequence: TR 4,500 ms, TE 97 ms, FOV 233 mm × 233 mm, voxel size 1.2 mm × 1.2 mm × 4.4 mm, and b values: 0, 100, and 800 s/mm2. ADCs were calculated for all slices according to the following equation:

$$ \mathrm{ADC}=\frac{ \ln \left({S}_0/{S}_1\right)}{\left({b}_1-{b}_0\right)} $$

where S 1 is the signal intensity of a voxel after application of a diffusion gradient and S 0 is the echo magnitude without diffusion gradients applied (b = 0 s/mm2). Diffusion sensitivity is determined by the difference between b 1 and b 0.

Pathological specimen preparation and interpretation

The prostate was step-sectioned into 4-mm transverse slices, perpendicular to its posterior surface, as previously described. An experienced genitourinary pathologist (DC, 19 years’ experience) reviewed the specimen and assigned a Gleason score as per the International Society of Urological Pathology (ISUP) consensus criteria [2]. This prostatectomy-defined Gleason score was used as the reference standard. The pathologist outlined each tumour focus on the histological sections and then imaged them directly on a flat-bed device.

The TRUS biopsy reports were reviewed. Where tertiary patterns were reported, the overall Gleason score was taken as the sum of the primary pattern and the highest grade pattern as per the ISUP consensus criteria [2].

Image co-registration

All images were co-registered and analysed using Mim Maestro (MIM Software Inc., Cleveland, OH, USA). The SPACE MRI sequence was used as the primary data set. The prostatectomy specimen digital images were divided up, stacked, and reorientated to match the MRI slices (Fig. 1). The DW MRI, CHOL PET, and FDG PET acquisitions were manually co-registered with the MRI using rigid body transformation as secondary data sets (Fig. 1).

Fig. 1
figure 1

The pathologist provided transverse sections of the prostatectomy specimens (a), which were co-registered with T2W MRI (b), forming fused images (c). Axial slices are also shown of co-registered DW MRI (d), CHOL PET (e), and FDG PET (f), zooming in on the prostate. A region of interest contour (yellow) was defined on the DW MRI and then the mean ADC was taken from this contour. A peripheral zone contour (green) was defined on the T2W MRI and then the SUVmax was taken from the co-registered CHOL PET and FDG PET within this contour

Extraction of quantitative information from the scans

One representative region of interest was defined for each imaging technique for each patient from whom voxel intensity data were extracted.

The DW MRI region of interest was defined as the visible peripheral zone tumour, from which the mean ADC (mm2/s) was taken. This contour was defined by two experienced genitourinary radiologists (CYH and SE, 15 and 12 years’ experience, respectively) who independently reviewed the MRIs. The radiologists were aware that all patients had biopsy-proven prostate cancer, but were blinded to all other clinical information. They reviewed all of the MRI sequences and then, using only DW MRI, contoured the regions inside the peripheral zones that they thought were likely to be involved by tumour. The two radiologists’ contours were combined by Boolean addition to form a single set of contours and then compared with the pathological specimens (Fig. 1). Contours that did not overlap with tumour on the prostatectomy specimens were removed. Where no discrete lesion was visible on the DW MRI, the region of interest was defined as the prostate peripheral zone.

The CHOL PET and FDG PET regions of interest were defined as the peripheral zone, from which the maximum SUV (SUVmax) was taken for each acquisition. A radiation oncologist contoured the prostate peripheral zone onto the primary data set for each patient and then had these contours reviewed by two genitourinary radiologists.

An “uninvolved peripheral zone” contour was created by Boolean subtraction of the DW MRI region of interest contour with 0.5-mm expansion from the peripheral zone contour described above. The mean ADC value was taken from this contour.

Statistical analyses

Gleason scores were categorised into three groups: Gleason risk (GR) 1 corresponding to Gleason score 6, GR 2 corresponding to Gleason score 7, and GR 3 corresponding to Gleason scores 8–10. The prostatectomy specimen GR was taken as the reference standard against which the other tests were compared.

Five index tests were evaluated to determine their ability to predict the prostatectomy specimen GR: (1) TRUS biopsy GR, (2) mean ADC score from DW MRI, (3) SUVmax from CHOL PET, (4) SUVmax from FDG PET, and (5) combined DW MRI and CHOL PET (calculated by dividing the mean ADC score [from DW MRI] by the SUVmax [from CHOL PET]).

Spearman’s ρ was used to determine the correlations between the predictive tests described above and the prostatectomy specimen GR. The strengths of these correlations were labelled using conventional statistical criteria: 0–0.19 was regarded as very weak, 0.2–0.39 as weak, 0.40–0.59 as moderate, 0.6–0.79 as strong, and 0.8–1 as very strong correlation [26]. Predictive tests that were found to have significant correlations were separated into GR groups based on arbitrarily defined cut-off values. The accuracy of the predictive tests for classifying the patients into the correct GR groups was evaluated. The accuracy of classification was compared between the different tests using two-tailed McNemar’s tests. Scatter plots were used to illustrate the relationships between the predictive tests and the prostatectomy GR.

The mean DW MRI ADC values of the regions of interest were compared with the uninvolved peripheral zone using a paired, two-tailed t test. Statistical tests were performed using SPSS Statistics 17.0 (IBM, Armonk, NY, USA).

Results

Twenty one patients were included in the final analyses owing to 6 patients not having DW MRIs and 3 patients having significant susceptibility artefacts (caused by prosthetic hips) on the DW MRIs. Two patients did not have identifiable lesions on the DW MRIs; therefore their entire peripheral zones were used as the regions of interest. The index lesions were in the peripheral zones in all 21 patients on the prostatectomy specimens. The patient characteristics are listed in Table 1.

Table 1 Patient characteristics

There was a significant, strongly positive correlation between TRUS biopsy GR and prostatectomy GR (ρ = 0.617, P = 0.003; Table 2). There was a significant, strongly negative correlation between the DW MRI mean ADC and prostatectomy GR (ρ = –0.601, P = 0.004). There was a non-significant, weak positive correlation between CHOL PET SUVmax and prostatectomy GR (ρ = 0.348, P = 0.122). There was a non-significant, weak positive correlation between FDG PET SUVmax and prostatectomy GR (ρ = 0.204, P = 0.375). The combination of DW MRI and CHOL PET had a significant strongly negative correlation with prostatectomy GR (ρ = –0.623, P = 0.003).

Table 2 Agreement between TRUS GR and prostatectomy GR

The cut-off values for classifying GR groups for each type of predictive test are shown in Table 3. Using these values, TRUS biopsy correctly classified 14 out of 21 patients (67 %), DW MRI correctly classified 14 out of 21 patients (67 %), and the combination of DW MRI and CHOL PET correctly classified 16 out of 21 patients (76 %). The differences between the three different combinations of these classification rates were not statistically significant (P = 0.69, 0.73, and 1.0). Scatter plots showing the relationships between the predictive tests and the prostatectomy GR are shown in Fig. 2.

Table 3 Cut-off values for classifying the predictive tests
Fig. 2
figure 2

Scatter plots of the predictive test values (from a DW MRI, b CHOL PET, c FDG PET, and d combined DW MRI/CHOL PET) versus the Gleason risk (GR). Where there is significant correlation or a trend towards correlation, a fitted regression line has been drawn. Horizontal lines have been drawn (corresponding to the cut-off values described in Table 2) for classifying the patients into GR groups

The mean DW MRI ADCs were significantly lower in the regions of interest ([1,049 ± 315] × 10–6 mm2/s) than in the uninvolved peripheral zones ([1,498 ± 213] × 10–6 mm2/s; P < 0.001; Fig. 3).

Fig. 3
figure 3

Box plot of the mean ADC in the region of interest (ROI) and the uninvolved peripheral zone (PZ)

Discussion

The main findings in this study are that DW MRI and combined DW MRI/CHOL PET correlate significantly with the prostatectomy GR, and have high rates of correct classification of the prostatectomy GR.

Many previous studies have shown that DW MRI has a significant correlation with Gleason score. Bittencourt et al. performed a study of 24 patients, comparing DW MRI ADC values with prostatectomy Gleason score [11]. They found a significant negative correlation between the mean ADC of suspicious lesions and Gleason score (Pearson’s r = –0.63, P < 0.01). Yamamura et al. performed a study of 50 patients, comparing DW MRI ADC with TRUS biopsy Gleason score [12]. There was a significant negative correlation between the ADC value and the Gleason score on TRUS biopsy (Spearman’s ρ = –0.405, P = 0.001). Turkbey et al. performed a study of 48 patients, comparing DW MRI ADC with TRUS biopsy Gleason score and D’Amico clinical risk score [13]. They found a significant negative correlation between the mean ADC and the TRUS biopsy Gleason score (Spearman’s ρ = –0.60, P = 0.003), and the D’Amico clinical risk score Spearman’s ρ = –0.69, P < 0.0001).

Several studies have also investigated relationships between CHOL PET SUV and Gleason score. Piert et al. performed a study of 14 patients, comparing tumour to background CHOL PET SUV ratios with prostatectomy Gleason scores [16]. They found significantly higher tumour to background SUV ratios in high Gleason score lesions (Gleason ≥4 + 3) versus lower Gleason score lesions (Gleason ≤3 + 4). Park et al. compared DW MRI, CHOL PET, and the combination of the two techniques with prostatectomy Gleason scores in a study of 17 patients [17]. They found significant differences between Gleason ≥3 + 4 cancers versus Gleason ≤3 + 3 cancers using DW MRI, CHOL PET, or the combination of the two techniques.

This study adds to the literature in several important ways. The prostatectomy specimen was used to define the GR in this study, as opposed to just using the TRUS biopsy as was used in some of the previous studies. This is important because the TRUS biopsy is an imperfect test, with discrepancies with the prostatectomy GR in 25–30 % of patients [7]. The accuracy of TRUS biopsy was also evaluated in this study, which provides an important comparison with routine clinical practice. Most clinicians accept the accuracy of TRUS biopsy and in fact rely on it to make initial treatment decisions. Our study confirms the results from larger studies [7] of the accuracy of TRUS biopsy and importantly shows that imaging can produce results that are at least comparable to the TRUS biopsy.

Furthermore, this study shows that multi-technique imaging with both DW MRI and CHOL PET may possibly be better than either imaging technique alone for predicting the Gleason score (although this study was not powered to detect a statistically significant difference). This is a subject that has not as yet been extensively explored [17]; therefore this study may provide an impetus for future research.

This study does have a number of limitations. First, all of the patients in this study had index lesions in the peripheral zone; therefore the analyses were limited to classifying peripheral zone lesions. Primary transition zone carcinomas have been shown to have different DW MRI characteristics from peripheral zone carcinomas [27]; however, this was not investigated in this study because none of the patients in this cohort had primary transition zone index lesions. Second, the cut-off points for classifying patients into GR groups were chosen to maximise agreement, and therefore have an element of bias. These cut-off points need to be validated in a prospective study before conclusions can be drawn about their true accuracy. Third, most of the patients had GR 2 (Gleason score 7) disease on their prostatectomy specimens. The uneven distribution of patients makes it difficult to achieve statistical significance in a small study like this. Fourth, all of the patients in this study had acinar adenocarcinomas. Non-acinar carcinomas account for 5–10 % of carcinomas that originate in the prostate, and identifying these histologies may have important implications for the prognosis and management of these patients [28]. It is unclear whether or not these histologies can be identified using imaging. Fifthly, the b values and other MRI parameters used in this study would likely make the ADC values reported in this study non-generalisable to other centres using different MRI parameters. Similarly, the PET imaging parameters may also make the SUV values reported here non-generalisable to other centres. Lastly, both CHOL and FDG PET may be limited by poor spatial resolution and the partial volume effect [29]. This means that smaller, high-grade lesions may be falsely classified as lower grade owing to the intrinsic limitations of PET. This factor needs to be taken into account when interpreting these results.

The main implication of this study is that certain types of imaging may be able to augment TRUS biopsy in the prognostication of prostate carcinoma. DW MRI may be useful for this purpose, and the combination of DW MRI and CHOL PET may possibly be better. This would be useful in many situations. In patients with low-risk prostate cancer on active surveillance, imaging may provide a non-invasive means of detecting Gleason score progression, and thus may reduce or obviate the need for repeated biopsies. Patients who are selected for radical radiotherapy may be more appropriately selected for different durations of androgen deprivation therapy on the basis of more accurate determinations of their risk groups. If a future study shows that imaging can reliably distinguish between cancerous and benign prostate glands, imaging may be able to replace TRUS biopsies in many situations.

In conclusion, DW MRI and combined DW MRI/CHOL PET have significant correlations with the prostatectomy GR and high rates of correct classification of the prostatectomy GR. These correlations and classification rates are comparable to those of TRUS biopsy.