Introduction

Since its introduction 50 years ago, the Gleason grading system has been widely adopted as one of the major prognostic factors in prostate cancer (PCa).1 Its use in several prognostic models, usually as a three-tier grading system (Gleason sore ⩽6, 7 or 8–10) helps risk stratify patient outcomes and guide therapeutic options.2 Several revisions of the initial grading system have been proposed to improve its prognostic performance and interpretation by physicians and patients.3, 4, 5 The International Society of Urological Pathology (ISUP) based on a 2014 consensus conference now recommends a new grading system based on its prognostic value ranging from 1 to 5: Grade Group 1=Gleason score ⩽6, Grade Group 2=Gleason score 3+4, Grade Group 3=Gleason score 4+3, Grade Group 4=Gleason score 8 and Grade Group 5=Gleason scores 9 and 10.5 The World Health Organization now proposed to use these new groupings all together to provide a better tool for physicians and patients.6 Editors of the major journals in the field of uro-oncology have recently called all potential contributors for using it.7

To date, the prognostic performance of this grading system has been assessed in several single-center studies8, 9, 10, 11 and three large multicentric cohorts of patients treated with radical prostatectomy (RP) and/or radiation therapy.12, 13, 14, 15 Epstein et al.12 first proposed this new grading system, based on data from US academic centers with final cohorts of 20 845 men undergoing RP and 5501 men treated with radiation therapy, and demonstrated the independent prognostic value of this classification to predict biochemical recurrence (BCR) in both cohorts. Recently, a validation of the prognostic performance of the new Grade Groups was reported in a nationwide population-based cohort from The National Prostate Cancer Register of Sweden with men undergoing RP and radiation therapy.15 To date, no external validation of its prognostic value has been proposed in multicentric study with European patients from different countries, who present potentially different outcomes.

To provide further evidence for the clinical relevance of these new Grade Groups in European patients, we assessed its prognostic performance in a large multicentric cohort of European men treated with RP. In addition, we assessed its differential prognostic value compared to the three-tier system with regards to predictive accuracy.

Materials and methods

Patient selection and data collection

Data from 30 711 patients with Pca, treated with RP at seven European academic institutions between 2005 and 2014, were reviewed after institutional review board approvals from each center. No patient had distant metastatic disease at the time of RP. Patients with preoperative androgen deprivation therapy or chemotherapy, preoperative PSA >50 ng ml−1, missing data regarding follow-up, preoperative PSA, biopsy and RP Gleason score, surgical margin status or pathological stage were excluded from the analysis. Patients (27 122) were considered for analysis. No patient received immediate postoperative radiotherapy, chemotherapy or androgen deprivation therapy.

Gleason scoring and pathological evaluation

Gleason score was assessed in all biopsy and RP specimen by uro-pathologists according to ISUP guidelines.4 However, tertiary Gleason score evaluation was not considered for analysis, as it was not routinely reported in some institutions. All surgical specimens were processed according to standard pathologic procedures. Pathologic stage was assigned according to the 2007 American Joint Committee on Cancer Tumor Node and Metastasis staging system. Lymphatic tissue removed during RP was submitted for histological examination. Positive pathological margin was defined as tumor cells in contact with the inked surface of the prostatectomy specimen.

Follow-up

Follow-up was performed according to institutional protocols in agreement with guidelines at the time. In general, patients were seen postoperatively quarterly for the first year, semi-annually in the second year and annually thereafter. PSA evaluation were performed at each visit. The primary endpoint was BCR, defined as PSA value >0.2 ng ml−1 on two consecutive visits. The date of BCR was attributed to the day of the first PSA. In few cases, a radiation or an androgen deprivation therapy was initiated for rising PSA before the cut-off of 0.2 ng ml−1 was reached. In these cases, the date of BCR was attributed to the first day of androgen deprivation therapy or radiation therapy.

Statistical analysis

Biopsy and RP new Grade Groups were analyzed separately as categorical variables. We used the χ2 and Kruskall–Wallis tests to assess the differences in categorical and continuous variables between each groups, respectively. BCR-free survival curves were plotted and compared using Kaplan–Meier method and log-rank test; univariable and multivariable Cox regression models addressed the associations of each group with BCR after RP. Adjusted pairwise comparison between groups were performed. All P-values were two-sided and statistical significance was defined as a P<0.05. Harrel C-index was performed to assess the prognostic discrimination of the models. Statistical analyses were performed using Stata 11.0 statistical software (Stata, College Station, TX, USA).

Results

Descriptive characteristics and association with clinico-pathological features

Baseline characteristics of the 27 122 patients according to biopsy and RP new Grade Groups are listed in Table 1. Higher new Grade Groups assessed on biopsy and RP specimen were associated with age, higher PSA, extracapsular extension, seminal vesicle invasion, positive surgical margins and lymph node metastases (P⩽0.001).

Table 1 Association of biopsy new Grade Groups (A) and RP new Grade Groups (B) with standard clinico-pathological variables in 27 122 patients treated with RP for PCa

Association of new Grade Groups with BCR

Median follow-up was 29 months (interquartile range, 13–54) for patients without BCR at last follow-up.

Significant differences regarding BCR-free survival (bRFS) were observed in the cohort between new Grade Groups based on biopsy (log rank test, P<0.001; Figure 1a) and prostatectomy specimen (log rank test, P<0.001; Figure 1b). The 4-year estimated bRFS for biopsy Grade Groups 1–5 were 91.3% (95% confidence interval (CI): 0.907–0.918), 81.6% (95% CI: 0.803–0.827), 69.8% (95% CI: 0.676–0.719), 60.3% (95% CI: 0.571–0.633) and 44.4% (95% CI: 0.399–0.488), respectively. The 4-year estimated bRFS for RP Grade Groups 1–5 were 96.1% (95% CI: 0.954–0.966), 86.7% (95% CI: 0.859–0.875), 67.0% (95% CI: 0.653–0.687), 63.1% (95% CI: 0.582–0.677) and 41.0% (95% CI: 0.371–0.449).

Figure 1
figure 1

Kaplan–Meier analysis according new Grade Groups at biopsy (a) and RP (b) specimen in 27 122 prostate cancer patients. ISUP, International Society of Urological Pathology; RP, radical prostatectomy.

Compared with Gleason score⩽6 (Group 1), all prognostic new Grade Groups based both on biopsy and RP specimen were independently associated with a higher risk of BCR on univariable and multivariable analyses (all P<0.001; Tables 2 and 3). In these models, new Grade Groups were the strongest predictors of BCR.

Table 2 Multivariable regression model using preoperative variables to predict bRFS in 27 122 patients treated with RP
Table 3 Multivariable regression models using postoperative variables to bRFS in 27 122 patients treated with RP

The discriminations of the multivariable prognostic pre-operative and post-operative models based on the current three-tier classification and the new five-tier system (2015 ISUP groups) were not clinically different (Table 4).

Table 4 Results of Harrel C-index for the entire cohort with the standard three-tier Gleason grouping (6, 7 and 8–10) and the new five-tier grade grouping

Adjusted pairwise comparisons revealed significant differences between all new Grade Groups (all P<0.001), except for groups 3 and 4 on RP specimen (P=0.10). Further adjusted pairwise comparisons considering primary pattern for tumors in group 4 demonstrated that patients with tumors 3+5 on biopsy did not have a higher risk of recurrence than patients in group 3 (hazard ratio: 0.97, 95% CI: 0.71–1.35, P=0.89). On RP specimen, patients with 3+5 tumors had a lower risk of BCR than patients in group 3 (hazard ratio: 0.57, 95% CI: 0.37–88, P=0.011). In the highest risk group, patients with tumors 4+5 on RP had similar risk of BCR compared with patients from Group 4 (hazard ratio: 1.16, 95% CI: 0.97–1.39, P=0.101).

Discussion

To accurately reflect tumor behavior and improve patient risk stratification, modifications to the Gleason grading system have been proposed during 2005 and 2014 ISUP expert conferences. Deletions of the Gleason scores 2–5 were the highlights of the first recommendations. The 2014 ISUP conference led to the upgrading of cribiform glands to Gleason pattern 4 and the adoption of a new grading system that ranges from 1 to 5 based on Gleason score. In our multicentric international cohort, we confirmed the prognostic value of these new groupings, both on needle biopsies and RP specimen from European men.

We first demonstrated new Grade Groups correlated with adverse pathological features on RP specimen. More interestingly, we confirmed significant differences in bRFS outcomes after RP between new Grade Groups. In group 1, the 4-year bRFS estimates were 91.3% and 96.1% for biopsy and RP grading, respectively. These results are in line with previous studies that reported 4- and 5-year bRFS estimates over 95% after RP.10, 12 These excellent outcomes reflect a better and homogenous definition of the tumors that are integrated to this very-low-risk group. One of the indirect missions of the 2005 recommendation was to shift aggressive therapies towards the Gleason 7 groups. In this latter group, we confirmed the need to consider Gleason 3+4 (group 2) and Gleason 4+3 (group 3) as distinct groups. In the RP cohort, BCR was twofold higher for Gleason 4+3 than for 3+4. In a recent study that assessed new Grade Groups with long-term follow-up, Group 3 was associated with a threefold higher risk of BCR and a sevenfold higher risk of distant metastasis or cancer-specific mortality compared with group 2.14 These results are in accordance with several studies, published before 2014 ISUP recommendations, that already demonstrated splitting Gleason 7 group was prognostically relevant. Indeed, 4+3 tumors were associated with adverse pathological features on RP specimen such as extraprostatic extension, seminal vesicle invasion or positive surgical margins.16, 17 After RP or radiation therapy, a primary pattern of 4 in this group was associated with worse bRFS and cancer-specific survival.18, 19, 20, 21 In our study, Group 3 tumors were even more likely to behave like group 4 tumors. Therefore, 4-year estimated bRFS in these RP Grade Groups were 67.0% and 63.1%, respectively. Similarly, in the study from the Johns Hopkins that assessed these new groupings for the first time, 5-year bRFS on biopsy Grade Groups 3 and 4 were 65% and 63%, respectively.9 Conversely, patients in the group 5 have a significantly worse prognosis compared with those in group 4; 4-year bRFS were 44.4% and 41% for biopsy and RP, respectively. These results confirm the lethal prognostic value of Gleason score 9–10 PCa.22

The question of whether this new classification can improve the prognostic use of established predictors of cancer outcome requires more than the conventional univariable and multivariable analyses of its association with disease outcomes. It must be established that its use adds information that improves by a statistically significant margin, or at least equals, the performance of a predictive model constructed without the new characteristics. One of our aims was to test the prognostic value of the new classification using concordance index. We found that, despite an independent association with bRFS, the new prognostic grouping failed to add prognostically relevant information beyond the three-tier grading (improvements of C-index <0.01). Interestingly, we demonstrated bRFS for patients with Gleason score 3+5 and 4+5 may be underestimated with this new system that assigns these patients to group 4 and 5, respectively. Such heterogeneity within the Grade Group 4 has been previously reported.11 Therefore, in higher risk patients, physicians should probably pay attention to the primary pattern of the initial Gleason score, in order to accurately risk stratify patients and avoid inaccurate risk estimation in patient counseling and decision-making.

Besides these weaknesses, the new grading system has several benefits that support its use in daily practice. This new system offers an easier and more comprehensible classification for physicians and patients alike. The previous 2–10 scale led to multiple combinations of Gleason patterns and a related score from 6 to 10 that may be misunderstanding. Indeed, patients may inadequately consider their disease as intermediate to high risk disease by referring to the widely used 1–10 scales from daily life. This misinterpretation may compromise the compliance to active surveillance that is now largely proposed to patients who have few positive biopsies with Gleason 6 PCa. Introducing the new 1–5 grading system with a first and lowest grade labeled ‘1’ may help address these concerns and allay some fears. Loeb et al.23 clearly demonstrated that from the patient’s point of view, traditional Gleason grading was confusing and the new Grade Groups more comprehensible. Almost 80% of the patients considered a 1–5 scale would be helpful in active surveillance decision making. Adopting this new classification may also help improve risk estimation and consequent treatment decision-making. Indeed, despite the evidence that supports the difference between Gleason 3+4 and 4+3 PCa regarding prognosis, the distinction between both groups still rarely impacts treatment decision and is not considered in the guidelines from the European Association of Urology, National Comprehensive Cancer Network or European Society for Medical Oncology. Zumsteg et al.24 recently proposed a new risk stratification for intermediate risk PCa patients treated with radiation therapy based on several unfavorable criteria including 3+4 and 4+3 distinction. This new classification may guide treatment decision regarding the duration of androgen deprivation therapy among these different groups. A similar controversy exists regarding the clinical value of differentiating Gleason 8 tumors from Gleason 9–10 tumors. Nevertheless, new paradigms may arise from better prognostication based on the new Grade Groups and guide active surveillance or androgen deprivation treatment decision. Finally, this new classification may help homogenize and standardize pathologic grade reporting for clinical and research use, leading to reproducibility.7

Our validation study has several limitations. First and foremost limitation is inherent to its retrospective nature that may introduce selection bias. In a multi-institutional study, variations in pathological workup and pathologist skills may additionally confound the results. However, all data were collected at high-volume centers with high expertise in uropathology and we only considered patients whose biopsies and RP specimen were assessed after adoption of the 2005 ISUP recommendations. Several pathological features that may increase the prognostic accuracy of the models such as the presence of a tertiary pattern, the number of positive cores and the percentage of involvement per core were not considered since not available at the time of data collection. Finally, our study has a limited follow-up with only bRFS as an endpoint. Assessment of an association with more meaningful endpoints such as metastasis-free, cancer-specific or overall survival would have been probably more useful to guide discussion about patient information and decision-making. To date, to our knowledge, only one study confirmed the prognostic accuracy of the present grading system with PCa death as an endpoint, in a cohort of patients treated conservatively.25 However, early BCR within two years should be considered as a surrogate for biologically and clinically aggressive PCa as it has a high likelihood to be related to micrometastasis.26, 27

Conclusions

The recently proposed new Grade Groups performed on biopsy and RP specimen is associated with adverse pathological features and strongly predicts bRFS after RP. Although it does not improve accuracy of the established prognostic models by a significant margin compared with the previous three-tier grading system, this new classification is more comprehensible and user friendly. Therefore, it helps physicians and patients in the discussion regarding disease aggressiveness and treatment decision-making. Further validation of this classification system for other treatment modalities and more meaningful endpoints such as metastasis is necessary.

Ethical standards

This study has been approved by the appropriate ethics committee.