Introduction

Prostate cancer (“PC”) is the most common cancer among men in Western countries [1, 2]. The survival rates have been increasing and are relatively high, with 5-year and 10-year overall survival rates of 91% and 90%, respectively, in 2014 [3]. Functional status after PC treatment is consequently an important outcome in addition to survival [4, 5]. Functional status includes disease-specific symptoms and impairment, as well as health-related quality of life. Patient-reported outcome (“PRO”) questionnaires are used to measure health-related outcomes directly reported by patients themselves. Combining PROs and clinical outcomes such as (relapse-free) survival to evaluate patients’ functional status is now a widespread method, not only in health-care research and clinical trials [6] but also in routine clinical care [7]. For cancer care, in particular, the importance of PROs has been increasing in recent years [8, 9].

PROs are particularly valuable in PC, since recovery and improvement in functional status are important indicators for treatment success as well as treatment regret [10,11,12], for radical prostatectomy (“RPE”) especially urological (incontinence, irritative/obstructive symptoms) and sexual outcome (erectile dysfunction). PROs can be used not only at the individual patient level, but also bundled to provide information about health outcomes with different providers. PROs can thus serve as a quality assurance instrument to compare performance between healthcare providers [13]. When this is done, however, the results have to be carefully reported, with the different providers’ casemixes being taken into account [14]. Otherwise, unadjusted provider comparisons may remain unfair and misleading.

Recently, Nossiter et al. reported the relationship between PROs after RPE on a hospital level and hospital volume [15] and thus used PROs to evaluate the hospitals’ performance. However, baseline PROs were not included in the models but are known to be important adjustors for follow-up PROs [16]. The study focuses rather on hospital volume than on evaluating how PROs can be used for comparing different PC-care providers.

Therefore, the aims of this study were to report on casemix-adjusted prostate cancer-specific PRO scores (urinary incontinence, irritative/obstructive function, sexual function) for comparing different operating sites and to present a casemix adjustment methodology for PROs.

Materials and methods

Data collection

Prostate cancer centers (“PCCs”) in Germany recruit patients for the Prostate Cancer Outcomes (“PCO”) Study since 2016. Before definitive treatment for PC, patients with clinically localized or (locally) advanced PC (any T, any N, M0) are requested to complete the Expanded Prostate Cancer Index Composite-26 (“EPIC-26”) questionnaire—either using a web interface or a paper–pencil version—along with three additional sociodemographic questions after providing informed consent. Patients are asked to complete the EPIC-26 questionnaire again one year after the start of treatment.

Questionnaire responses are bundled with clinical information documented for certification purposes and according to the International Consortium for Health Outcomes Measurement (“ICHOM”) standard set [17] by the PCCs, including disease-specific information and treatment information, using the OncoBox [18]. This tool harmonizes output formats from different tumor documentation systems.

The PCO is an ongoing study and forms part of the TrueNTH Registry launched by the Movember Foundation. The ethics committee of the Medical Association of Berlin has approved the study (Eth-12/16). All of the participating PCCs are surgical sites and are referred to here as “sites” or “operating sites.”

Measures

The analysis presented here includes data for PC patients taking part in the PCO study who underwent (any kind of) RPE as the first definitive treatment. Patients receiving additional (salvage) treatments like androgen deprivation therapy (“ADT”) or radiation before completion of the follow-up questionnaire were excluded from the analytical data set. Sites with fewer than 10 patients were not included in the analysis.

Dependent variables

The EPIC-26 domain scores [19] 1 year after RPE (t1) were used as outcome measurements. The EPIC-26 is a well-established, PC-specific PRO questionnaire recommended by the ICHOM [20] that summarizes responses in five domains: incontinence, irritative/obstructive function, bowel function, sexual function, and vitality/hormonal function. The validated German translation of the EPIC-26 was used [21]. For this article, only the three most RPE-relevant domains urinary incontinence, irritative/obstructive function and sexual function are reported. However, all results for bowel function and vitality/hormonal function can be found in Supplementary Material 6.

All EPIC-26 domain scores can range between 0 and 100, with 0 indicating the poorest functional outcome. They are calculated using a scoring manual [22].

Adjustors

The following disease-specific and sociodemographic patient characteristics were used to adjust for different casemixes at the different operating sites (reference category in bold, selection based on prior research [23] and iteratively):

  • Baseline EPIC-26 domain score before surgery (t0).

  • Age at diagnosis (categories < 60, 60–69, 70–79, > 79).

  • Risk classification according to the German PC guideline [24], which follows the D’Amico risk classification [25] (categories: localized low risk, localized intermediate risk, localized high risk, locally advanced, advanced).

  • Number of comorbidities (categories: none, 1–2, > 2, unknown).

  • Educational level (measured as highest school qualification, categories: lower secondary school, intermediate secondary school west, intermediate secondary school east [the school-leaving certificate of the German Democratic Republic up to 1990], entrance certificate for university of applied science, entrance certificate for university, other, none).

  • Health insurance type (categories: statutory, private, other/none).

  • Nationality (categories: German, other).

  • Hormone therapy before surgery (ADT, categories: no, yes).

  • Active surveillance (AS) before surgery (categories: no, yes).

Missing values

Multiple imputations by chained equations and k-nearest neighbor imputation was used for handling missing values for all variables expect for numbers of comorbidities. More information can be found in supplementary material 4.

Casemix adjustment of EPIC-26 scores 1 year after RPE

Casemix adjustment is a statistical method to account for different casemixes when comparing different healthcare providers [16]. The following casemix adjustment approach was used for this analysis (similar to the National Health Service England approach [26]):

  1. 1.

    Calculation of \(e={\mathrm{expected}}_{{t}_{1}}\) EPIC-26 scores (t1) for each patient, using a multiple regression model with the EPIC-26 score (t1) as dependent variable and patients’ disease-specific socioeconomic characteristics and baseline PROs t0 (compare the “Adjustors” section above).

  2. 2.

    Calculation of a performance indicator for each site as the mean of the difference between observed and expected EPIC-26 scores one year after RPE:

    $${\text{performance}}_{j} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {{\text{observed}}_{{t_{1} i}} - {\text{expected}}_{{t_{1} i}} } \right)$$

    with \(n: \text{number of patients treated by site j}.\)

  3. 3.

    Calculation of the adjusted score (t1) for each site:

    $${\text{adjusted}}_{j} = {\text{observed}}_{{t_{1} {\text{all}}}} + {\text{performance}}_{j}$$

    with \({\mathrm{observed}}_{{t}_{1}\mathrm{all}}: \text{mean of }{\mathrm{observed}}_{{t}_{1}}\text{ of patients from all sites.}\)

In addition, minimal important difference (MID) ranges were calculated: MIDs are the smallest change in a treatment outcome—as PROs—that a patient would identify as important. For a better comparison between sites, MID ranges for every site were included in the graphics with the adjusted EPIC-26 scores:

$${\text{MID range}}_{j} = \left[ {{\text{adjusted}}_{j} - {\text{MID}};{\text{adjusted}}_{j} + {\text{MID}}} \right].$$

Skolarus et al. determined MIDs for each EPIC-26 score. [27]. Those MIDs are currently frequently used in urological research to assess differences in EPIC-26 results (i.e. [28]). The lower threshold for each EPIC-26 domain is listed below:

  • Incontinence: 6

  • Irritative/obstructive function: 5

  • Bowel function: 4

  • Sexual function: 10

  • Vitality/hormonal function: 4

R2 and adjusted R2 were calculated for all regression models. The models were checked for heteroscedasticity, multicollinearity, and normality using the R package “performance” (results are available upon request).

Effects of casemix adjustment were measured using Cohen’s d when comparing unadjusted and adjusted scores. A Cohen’s \(\left|d\right|>0.5\) indicates a large effect.

The following formula was used to measure the number of poorer-performing sites:

$${\text{MID}}_{{{\text{lower}}}} {\text{ratio}} = \frac{{N_{{{\text{adjusted}} < \tilde{x} - {\text{MID}}}} }}{N}$$

for each EPIC-26 domain with \({N}_{\mathrm{adjusted}< \tilde{x }-\mathrm{MID}}\): number of sites, with an adjusted score less than \(\tilde{x }-\mathrm{MID}\), \(\tilde{x }\): median of adjusted scores and \(N\): total number of sites.

In addition, absolute differences between unadjusted and adjusted scores were analyzed and compared with the corresponding MIDs: for each EPIC-26 domain, the number of operating sites for which this absolute difference was greater than or equal to the corresponding MID was calculated as an indicator of adjustment effects.

Operating sites with fewer than 10 patients were not included in the analysis. The TRIPOD Statement advice was followed for model development [29]. All statistical analyses were performed using the R statistical package, version 4.0.2 (2020-06-22).

Results

Participants and operating sites

Between July 2016 and July 2019, 13 218 men participated in PCO. Data for 7 065 patients who underwent radical prostatectomy between July 2016 and 2019 with no additional (salvage) treatment between baseline and answering the EPIC-26 questionnaire one year after surgery at 88 different sites were included in the analysis (see Supplementary Material 1 for a more detailed description of the data sample). The mean age of the patients included was 65.5 (SD 7.2). Most of the patients had localized PC with intermediate risk. Table 1 lists all of the patients’ characteristics.

Table 1 Characteristics of the 7065 patients

Table 2 presents the interquartile ranges (IQRs), means, and standard deviation (SD) of the EPIC-26 scores before and one year after RPE. At both measurement time points, sexual function showed the lowest score (62.9 (t0), 26.7 (t1)), whereas urinary incontinence had the highest at baseline (93.45 (t0), (t1)) and irritative/obstructive function had the highest after RPE (86.61 (t0), 91.12 (t1)) among the selected EPIC-26 domains.

Table 2 EPIC-26 scores before (t0) and after (t1) radical prostatectomy: unadjusted on patient-level data, adjusted (t1) on-site level

Functional outcome in PCCs 1 year after RPE

Regression models for casemix adjustment

Table 1 of the Supplementary Material 5 shows the results for each regression model (one per EPIC-26 domain), including estimates and p values. Goodness of fit for the models ranged between R2 = 0.1 (irritative/obstructive function), R2 = 0.11 (incontinence), and R2 = 0.22 (sexual function).

Supplementary Material 2 provides the results of the following sensitivity analysis: only sites with more than 49 patients and only sites that documented comorbidities. Both showed the same trends as the models presented in this paper.

Comparison of adjusted and unadjusted EPIC-26 scores

Differences between adjusted and unadjusted EPIC-26 site scores were evaluated using Cohen’s d as effect sizes. Cohen’s d was 0.03 (irritative/obstructive function), − 0.08 (urinary incontinence), and − 0.2 (sexual function).

The median absolute difference between adjusted and unadjusted EPIC-26 site scores ranged between one domain score point (irritative/obstructive function) and three domain score points (sexual function). Differences in adjusted and unadjusted scores barely reached MIDs, only for urinary incontinence, the score after RPE changed by six points or more (MID for incontinence) at six sites. For the remaining domains, no sites were identified that had an absolute difference greater than or equal to the corresponding MID (see Supplementary Material 3: observed and adjusted site-specific EPIC-26 scores).

Casemix-adjusted EPIC-26 scores one year after RPE

Figure 1 shows casemix-adjusted EPIC-26 scores one year after RPE for the sites, including MID ranges. The mean adjusted score was lowest for the EPIC-26 domain of the sexual function (24.6); by contrast, the mean adjusted score was highest for irritative/obstructive function (90.4). Table 2 shows means and IQRs for all selected adjusted EPIC-26 scores.

Fig. 1
figure 1

Adjusted EPIC-26 scores (t1) 12 months after radical prostatectomy. EPIC-26 scores per site, including the minimum important differences (MID) range. a Incontinence, b Irritative/Obstructive Symptoms, c Sexual function

The \({\mathrm{MID}}_{\mathrm{lower}}-\mathrm{ratio}\) values (see the “Materials and methods” section above) were 0.26 (incontinence), 0.06 (irritative/obstructive function), and 0.01 (sexual function.

Discussion

These results show that urinary and sexual functional outcomes one year after RPE differ between operating sites before and after adjustment for relevant confounders such as patients’ baseline functional status, disease information (risk for recurrence), age, and socioeconomic status. For patients who undergo RPE, incontinence and impaired sexual function—which are characteristic adverse effects after surgery—are by far the most important PC-specific PROs, and these two differ most between sites. For incontinence, 26% of the operating sites had a poorer adjusted score than the sites’ median minus the corresponding MID (as measured by the \({\mathrm{MID}}_{\mathrm{lower}}-\mathrm{ratio}\)). In addition, the functional outcome for PC patients improves one year after RPE only for the irritative/obstructive domain score.

Variation in PROs after RPE between sites has recently been investigated based on the NHS National Prostate Cancer Audit database [15]. The main focus of the study was to understand the impact of hospital volume on functional outcomes. Since the casemix adjustment methodology did not include baseline PROs we did not replicate their approach but instead carefully adapted the casemix adjustment methodology proposed by the NHS England for elective surgery, with particular emphasis on choosing PC-specific, PRO-relevant and accessible patients’ characteristics as adjustors. Specifically, baseline PRO scores were included in all models and showed the strongest predictive value for PROs after one year (compare Table 1, Supplementary Material 5). R2, as a goodness-of-fit criterion, ranged between 0.1 (irritative/obstructive function) and 0.22 (sexual function) for the different domain scores, suggesting that the adjustors used do not work equally well in the models. This finding is consistent with previous research by Laviana et al. [23]. Thus, some variance remains for all the analyzed PROs that the variables in our models cannot explain. This was expected, as it was hypothesized that there would be differences in PC care between the participating operating sites. According to Iezonni, a health-care outcome can be expressed as a function of patients’ characteristics, treatment effectiveness, and quality of care [30]. Since patients’ characteristics were included in the models and only the same treatment method (RPE) was compared,Footnote 1 the “unexplained” variance in the models may reflect differences in the choice of procedure (open, laparoscopic, robotic), different levels of surgical experience or differences in follow-up care and this should be a topic for further research. For different surgical procedures, though, Haese et al. could show that functional outcomes do not differ between open and robotic approaches [31], whereas Nyberg et al. report a “moderate advantage for the robotic technique regarding erectile dysfunction” [32]. Research by Vickers et al. already strengthens that surgery outcome is strongly influenced by the individual surgeon [33]. However, since patients mostly choose sites rather than surgeons, variation on-site level is important to focus on.

Casemix adjustment is essential when comparing sites. To the best of the authors’ knowledge, there is no established method for analyzing the effects of adjustment—i.e., for describing how large a difference the adjustment makes. We, therefore, propose two different measurements: Firstly, taking effect sizes measured by Cohen’s d into account, there are no substantial differences between the observed and casemix-adjusted PRO scores in the analysis (largest Cohen’s d for sexual function: − 0.2)—i.e., the casemix adjustment does not “change” the sites’ scores to any large extent. Secondly, we propose an indicator for measuring adjustment effects, counting the sites for which the EPIC-26 scores changed more than the corresponding MID. The results show that the adjusted score differed to an extent greater than or equal to the corresponding MID only for urinary incontinence. Hence, both measurements reflect the fact that, although casemix adjustment is essential, for most sites the scores do not change to a clinically perceptible extent (hence MIDs).

This is the first comparison between so many different operating sites (n = 88 PCCs). However, we are aware of some limitations. Firstly, although the study sample is relatively large, the casemix adjustment methodology needs to be tested for robustness using larger samples. Furthermore, more detailed information on the patients’ development of functional outcomes, i.e., by more frequent follow-up questionnaires, would be desirable.

Another limitation of the study is the measurement of comorbidities which are important confounders for e. g. incontinence or erectile dysfunction. Since comorbidities do not yet form part of data collection for the quality assurance (certification) data on which the study is based, clinicians were asked to document them for the study additionally, resulting in relatively high missing rates. On a positive note, all of the other clinical and treatment information was close to complete.

Thirdly, this analysis is limited to the first description of differences between operating sites. Since different surgical approaches and follow-up care are not included in the analysis, the differences in adjusted outcomes cannot be explained in depth because it is beyond the scope of this article.

Moreover, as a key indicator to evaluate the variation of functional outcome, MIDs were proposed. For this analysis, validated MIDs from Skolarus et al. were available based on a US PC population [27]. However, as Revicki et al. pointed out, MIDs are best applicable when developed using a validation population which is similar to the target population [34].

Conclusion

The present results show differences in the PC care provided for patients who undergo radical prostatectomy. Hence, the choice of an operating site for RPE has an impact on the patient’s outcome. However, the reasons for these differences were not analyzed. The results thus raise questions about firstly, why there are so many differences in PROs between operating sites and secondly, how these differences could be reduced. As a first step, all of the results were transparently presented to and discussed with clinicians from the participating PCCs. Additionally, PROs could be reported annually in audit reports and thus be used as a benchmarking instrument for PCCs. For this purpose, a restriction to specific EPIC-26 domains such as urinary incontinence and sexual function being relevant for surgical treatment may be useful. The use of PROs for benchmarking will only be possible if PROs are carefully and robustly casemix-adjusted and thus accepted by the clinical stakeholders. The casemix adjustment methodology described here appears to be a promising and practicable approach for using patient-reported outcomes to compare prostate cancer operating sites.