Introduction

Since its first description in the 1960s, the Gleason grading system has been accepted as a grading standard and as the most accurate histopathological factor for the prognosis of prostate cancer (PC) patients [1]. Grading of prostate biopsy (PB) specimens is an important factor for counseling and decision making in men diagnosed with PC. The choice to perform active surveillance, nerve-sparing radical prostatectomy (RP), a pelvic lymphadenectomy or androgen deprivation therapy in addition to external beam radiation is all based on preoperative risk parameters such as PB Gleason score (GS) and pre-treatment prostate-specific antigen values (PSA) [24]. Thus, incorrect grading of PB specimens can result in inappropriate management of patients. Undergrading can lead to treatment delays or undertreatment (positive surgical margins, no lymph node resection) requiring secondary therapeutic options and compromising quality of life and survival.

Several studies have shown that discordance between PB GS and RP GS appears in almost 50 % of all cases [5]. Histopathological Gleason grade evaluation can significantly affect the accuracy of tumor classification [6]. Different levels of experience and skills of pathologists have been reported to be associated with GS discordance between PB and RP specimens [710]. However, the clinical significance of incorrect PB GS grading remains to be elucidated. In this study, we aimed to evaluate the accuracy of the PB GS grading depending on the diagnosing pathologist (community vs. uro-pathologist) and the prognostic impact on the oncological outcome in a contemporary RP series.

Materials and methods

Patient selection and data collection

All men who underwent robotic-assisted RP (RARP) between May 2005 and December 2013 in our tertiary-care academic center were retrospectively identified. Patients diagnosed with PC after ultrasound-guided transrectal PB performed in our center or externally by community urologists were eligible for this study. Men with PC diagnosed in specimens of transurethral resection of the prostate (TURP), or diagnosed after magnet resonance imaging (MRI)-guided biopsy and men who received neoadjuvant androgen deprivation treatment were excluded from further analysis. Electronic hospital charts were reviewed to collect peri- and postoperative data. Additionally, data were retrieved from referring urologist or patient’s general practitioners if follow-up was not performed in our center.

Patients were divided into two groups, depending on whether their PB specimen had been evaluated in our institution by a minimum of two pathologists (uro-pathologist [UP] group)—of whom at least one expert in urologic pathology—or by pathologists in the community (community pathologist [CP] group). PB GS, core numbers and numbers of positive cores were retrieved from the respective pathology reports. PB slides of patients who were referred to our center by community urologists were not reviewed before RARP at our institution.

Surgical procedure and pathological analysis

RARP was performed by five experienced prostate surgeons using the three- and four-arm daVinci® Surgical System (Intuitive Surgical Inc., Sunnyvale, CA). Bilateral extended pelvic lymph node dissection (EPLND) was performed as described earlier in patients with either a PSA level of ≥10 ng/ml or a preoperative GS of ≥7 [11]. A nerve-sparing procedure was usually performed bilaterally in patients with cT1/2 PC and PB GS ≤7 and unilaterally in selected patients with GS 8 and small tumor volume identified on PB on the contralateral side.

All RARP specimens were processed at our institution. Comprehensive pathologic analysis was performed using standardized whole-mount sections [12].

Statistical analyses

SPSS version 22.0 (IBM Corp., Armonk, NY, USA) was used for statistical analyses. All two-sided p values ≤ 0.05 were considered statistically significant. The CP and the UP group were compared using Fisher’s exact test for categorical variables and the Mann–Whitney U test for continuous variables.

PB GS was compared with the GS of their respective RP specimen individually. A RP GS higher or lower than the PB GS was defined as GS upgrade or downgrade, respectively. Concordance between PB GS and GS of the surgical specimen was calculated using the Cohen’s kappa test (correction for agreement expected by chance) [13]. Additionally, stratification according to the recent ISUP 2014 prognostic grade groups (PGG) revision was performed [1, 14, 15].

Subgroup analyses evaluating clinically significant up- and downgrading for the following three treatment groups were performed: GS 5–6 = potential candidate for active surveillance [16], GS 7 = eligible for nerve-sparing [2] and GS 8–10 = high-risk tumors who should have undergone RP and EPLND without NS [2, 16] according to our institute’s guidelines.

Logistic regression models adjusting for preoperative parameters were built to identify predictive factors for GS upgrades.

BCR was defined as PSA value ≥0.1 ng/ml with subsequent confirmation after reaching a PSA nadir ≤0.1 ng/ml postoperatively. The predictive impact of pre- and postoperative parameters on PSM and BCR rates was assessed by a stepwise logistic regression and a Cox regression model, respectively.

To assess differences in prognostic accuracy of the CP versus UP PB grading, Kaplan–Meier analyses of BCR-free survival (BCRFS) were performed and estimates were compared between the CP and UP group using the log-rank test. Subgroups were formed according to the D’Amico criteria [17].

Results

Of 826 patients undergoing RARP, 40 were excluded due to PC diagnosis by TURP (n = 22) or MRI-guided biopsy (n = 11) or due to neoadjuvant androgen deprivation therapy (n = 7) resulting in 786 patients eligible for the final analysis. Pre-, intra- and postoperative data of these patients are summarized in Table 1.

Table 1 Patient characteristics

Patients in the CP group had higher preoperative PSA levels (mean 10.1 vs. 8.1 ng/ml) and had more often palpable tumors (31 vs. 22 %), and fewer cores were taken during PB (mean 9.7 vs. 10.7). The distribution of PB GS was not significantly different between two groups.

Final pathology revealed significantly higher rates of extraprostatic extension, more GS 8–10 tumors and more PSMs in patients initially diagnosed by a CP. Even when stratified for pathological tumor stage and final GS, persistently higher rates of PSMs could be observed in the CP group. The PSM rate for Gleason 8–10 tumors was 49.5 % in the CP group and 15.2 % in the UP group (p < 0.001).

Tables 2 and 3 show the number and percentage of upgrades and downgrades between biopsy and RARP specimen. The overall GS concordance measured by kappa was fair (0.273) in the CP group and moderate (0.411) in the UP group. Significantly higher rates of overall and clinically significant upgrades were found when biopsies were graded by a CP.

Table 2 Gleason score up- and downgrades stratified by origin of pathology report
Table 3 Radical prostatectomy grades stratified by biopsy Gleason scores and tumor stage

Table 4 displays the results of the uni- and multivariable logistic regression analyses. A higher preoperative PSA level, a lower number of biopsy cores as well as grading by a CP predicted GS upgrade in the univariable analysis (Table 4a). In the multivariable analysis, a higher PSA level and grading by a CP remained significant predictors of GS upgrade. The second model assessed the risk of PSMs (Table 4b). In the multivariable analysis, a RP GS of 7, pT3 tumor stage and CP grading remained independent predictive factors for PSMs.

Table 4 Uni- and multivariable logistic regression models to predict (a) Gleason score upgrade from biopsy to radical prostatectomy and (b) positive surgical margin status

Median follow-up time was 36 (range 1–101) months. A postoperative PSA nadir of <0.1 ng/ml was not reached by 50 (10.3 %) men in the group of patients graded by a CP and by 17 (7.5 %) men in the group of patients graded by a UP (p = 0.025). This difference remained significant in a subgroup analysis stratified for risk groups (data not shown). Table 5 displays the results of the Cox regression analysis. In the univariate model, all variables but nerve-sparing were significantly associated with a lower BCRFS rate. In the multivariable model, grading by a CP remained an independent predictor of BCR with a hazard ratio (HR) of 1.65 (p = 0.028).

Table 5 Uni- and multivariable Cox regression analysis of predictors for biochemical recurrence-free survival

The Kaplan–Meier analyses of BCRFS comparing the CP and UP group are shown in Fig. 1a–d. For low-risk patients (n = 177), a higher BCR rate was detected in the CP group (Fig. 1b). Comparison of these estimates with survival analyses based on final pathology showed that the UP PB GS 6 curve resembled more the RP GS 6 curve than the CP PB GS 6 curve did (Fig. 1b).

Fig. 1
figure 1

Kaplan–Meier analyses for biochemical recurrence-free survival. a All Patients stratified for origin of pathology report. b Patients with a PSA of <10 ng/ml, clinical T1 stage and PB GS of 4–6 stratified for origin of pathology report and compared to concordant and higher RP GS. c Patients with a PSA of <20 ng/ml, clinical T1–2 stage and PB GS of 7 stratified for origin of pathology report and compared to concordant and higher RP GS. d Patients with a PSA <20 ng/ml and a RP GS of 8–10 (final pathology) stratified for origin of pathology report

In the intermediate-risk group (n = 319), comparison of PB GS 7 BCRFS with estimates based on finally pathology showed that the UP PB GS 7 curve resembled more the RP GS 7 curve than the CP PB GS 7 curve did (Fig. 1c).

A significantly lower BCRFS rate could be observed for patients with a RP GS of 8–10 and preoperative PSA level of <20 ng/ml (n = 92) when biopsies had been graded by a CP (Fig. 1d).

Discussion

The present study is the first to evaluate the association between pathology report origin and concordance between PB and RP GS, as well as its impact on oncological outcome in a contemporary series of patients treated by RARP for PC. We were able to show that PB GS undergrading was significantly more frequent if PB grading had been performed by a CP compared to a UP. In addition, PB GS grading by a CP was an independent predictor of worse oncological outcome. Men in the CP group more often had PSM, postoperative PSA persistence and BCR compared to the UP group.

In 1992, DF Gleason himself raised the problem of non-dedicated pathologists having the tendency to not recognize small amounts of higher tumor grade [18]. This might explain why in the present study more than half (54.5 %) of the tumors with a Gleason 8–10 on final pathology were not recognized as such by the CP compared to 37 % missed by the UP. Steinberg and colleagues backed Gleason’s statement in 1997 with reporting a higher rate of GS upgrade between the biopsy and the prostatectomy specimen when the biopsy material was analyzed in non-academic settings (37 vs. 28 %) [7]. Kuroiwa et al. [10] reported in 2011 a 16 % higher rate of undergrading by CP. Notably, their studies included data from the pre-ISUP 2005 era and did not assess the prognostic impact of the pathologist on oncological outcome.

An improved concordance rate is not only a theoretical advantage, but is of clinical relevance. Undergrading of PB samples leads to an underestimation of the actual disease burden and can have considerable consequences for therapeutic decision making after diagnosis of PC. Active surveillance, brachytherapy and new therapeutic approaches such as focal therapy are currently considered inappropriate treatment options for most patients with intermediate- or high-risk PC [16, 19]. Furthermore, the extent of surgical resection (LND, nerve-sparing) but also the indication for concomitant androgen deprivation with radiotherapy is based on preoperative risk stratification [24] and can have significant impact on patient’s quality of life. Additional treatments may become necessary or opportunities for cure may be missed in patients with misclassified tumors. Our subgroup analysis for clinically significant undergrading revealed higher rates of misclassification of patients when biopsies were graded by a CP compared to a UP based on inclusion criteria for active surveillance (64.2 vs. 48.7 %) and recommendations for nerve-sparing (12.6 vs. 6.7 %).

Accordingly, based on our hypothesis that an inaccurate PB GS compromises an adequate surgical approach and therefore the oncological outcome, we could show that grading by a CP is an independent predictor of PSMs. A surgeon underestimating the tumor might select inadequate patients for nerve-sparing and perform a more extensive preservation of surrounding structures. The impact of GS underestimation on PSM rates has been described before; Corocoran et al. [20] reported in a retrospective analysis significantly higher PSM rates in upgraded tumors than in corresponding concordant tumors. The PSM rate in undergraded Gleason 3 + 4 tumors was significantly higher than in accurately diagnosed counterparts. In the present investigation, nearly half of the patients with a RP GS of 8–10 had PSMs when PBs were graded by a CP, compared to only 15 % in patients graded by UP. This observation gains importance considering that more than every second patient with a RP GS 8–10 had been assigned a lower GS preoperatively by a CP. Consequently, the observed rate of nerve-sparing was also significantly higher in this group compared to the group of patients whose PB cores had been graded by a UP. This further indicates the important role of surgeon’s decisions in the high-risk situation for a favorable surgical resection of the tumor and possible consequences of undergrading for the oncological outcome (Fig. 1d).

The assessment of BCRFS was performed in this study for two reasons: One motive was to evaluate the impact of preoperative parameters including the grading pathologist on a more objective criterion of midterm oncological outcome. The second reason was to address the issue of an innate bias: Our pathologic department was grading both of the specimens (PB and RP). This might have led to a better concordance between the UP PB GS and RP GS. Therefore, we investigated the prognostic accuracy of the PB and RP GS for BCRFS and discriminated which of the PB GS (CP or UP) was a more precise reflection of tumor behavior.

Kaplan–Meier analysis indicated RP GS 5–6 as an excellent predictor of tumor aggressiveness with a 5-year estimated BCRFS of 96 %. This is in line with large reported series and the expected behavior of these tumors [21]. With an estimated 5-year BCRFS of 90 % in the group of low-risk tumors (based on PB GS), grading by UP achieved a higher predictive accuracy than grading by a CP (5y-BCRFS of 83 %).

A similar constellation was observed in the intermediate-risk group. While the better outcome of patients with a PB GS 7 graded by a UP is probably due to the higher rate of downgrading at RP, the Kaplan–Meier curve of CP-graded patients with PB GS 7 estimates a BCRFS rate more similar to patients with a RP GS 8–10 than RP GS 7, hinting at the larger proportion of actual higher GSs in this group.

This study has limitations. It was not possible to distinguish whether the grading in 2005 and 2006 had been performed according to the classic system or according to the ISUP 2005 consensus. It can be reasonably assumed that the modernized Gleason grading scheme was more rapidly adopted by dedicated UPs than among CPs. However, the high number of patients and a period of almost 9 years strengthen the hypothesis of permanent and significant difference in the accuracy of Gleason grading in the two groups. Furthermore, there were statistically significant differences in the preoperative parameters of the two groups. One potential explanation of these differences is that community urologists do refer patients with more advanced tumors to academic centers for surgery. To overcome this problem, we performed multivariable logistic and Cox regression analyses and subclassification of patients into risk groups. Finally, a retrospective evaluation of an accurate biopsy technique is not feasible. We analyzed the number of biopsy cores, which was slightly lower for patients diagnosed by a CP. We further assume that a median of 10 cores in the CP and 12 cores in the UP-graded group exceeded the critical cutoff of 6 cores for a significantly increased risk of GS upgrade [6] and had therefore limited impact on the accuracy of the grading in this study. We did not perform a re-evaluation of CP graded Gleason scores by a UP because of legal (necessity of signed declarations of agreement by every single patient) and resource issues (collection of slides from multiple CP archives and re-evaluation of 487 biopsy sets by an experienced UP). We believe that these results demonstrate real life data and sensitize to the actual oncological impact of grading by non dedicated uro-pathologists on patient outcome in daily practice.

Conclusion

Dedication of pathologists not only affects the rate of discordance but also the oncological outcome of patients treated for localized prostate cancer. We strongly recommend prostate biopsy specimen to be reviewed by a dedicated uro-pathologist to improve the concordance between prostate biopsy and radical prostatectomy Gleason score, which in turn allows to choose the most appropriate treatment option and eventually results in a better oncological outcome.