Introduction

Cytoreductive nephrectomy (CN) has been an integral part of a multimodal management concept of patients with synchronous metastatic renal cell carcinoma (mRCC) treated with cytokines, as two randomised controlled trials demonstrated a significant overall survival (OS) advantage before treatment with interferon-alpha-based therapy [1,2,3]. Over the past decade, targeted therapies (TT) with VEGF receptor tyrosine kinase inhibitors (TKI) and mammalian target of rapamycin (mTOR) inhibitors have replaced cytokine treatment and are now the accepted standard of care [4, 5]. Since the inception of these agents, both the role of upfront CN and the timing of CN have been questioned. Two randomised controlled trials (SURTIME and CARMENA) were initiated to investigate the role of CN in conjunction with sunitinib. Unfortunately, both trials suffered from significant recruitment problems [6]. Indeed, SURTIME was stopped early and did not show a difference in progression-free survival at 28 weeks [7], which may, in part, due to sample size and insufficient statistical power. Likewise, there were considerable challenges with patient recruitment in CARMENA, and this trial is expected to end 6 years later than originally anticipated. Furthermore, there are some concerns that CARMENA may only answer the question of whether both trial arms are equivalent rather than showing than showing that one arm is superior [6, 8]. Taken together, there is at present no level 1/2 evidence regarding the role of CN for mRCC treated with TT.

While results from these randomised controlled trials are awaited, best evidence for clinical practice and hypotheses for future randomised trials are derived from retrospective observational studies. Indeed, multi-centric and registry data suggest that CN may be associated with a 40–55% relative improvement in OS [8,9,10]. Despite multivariable adjustment for measured confounders, a prevailing hypothesis for this large benefit is selection bias. This could randomly favour the CN arm, as these patients may be those in a better general condition or those of a more favourable prognostic group. However, there are novel statistical methods such as inverse probability of treatment weighting (IPTW) and sensitivity analyses without assumptions that improve adjustment for measured and unmeasured confounders and thus control for selection bias [11]. As such, we sought to compare OS between patients undergoing CN and no CN (NCN) for synchronous mRCC at our tertiary care centre using these modern statistical approaches.

Patients and methods

Patient population

For this retrospective single-centre study, patients were identified from the prospectively maintained Cambridge Oncology Registry. All patients treated with TT (VEGFR-TKIs or mTOR inhibitors) for synchronous clear cell and non-clear cell mRCC between 2006 and 2017 were identified; those with hereditary RCC syndromes, concomitant malignant tumours other than RCC and those who underwent complete surgical resection of all metastatic sites were excluded, leaving 261 patients as the principal study cohort. The decision to recommend a CN was based on a multidisciplinary team discussion of oncologists and urologists. Multiple variables were taken into consideration, including performance status (PS), tumour volume, number and location of metastatic sites, age, co-morbidity, surgical operability, expected surgical morbidity, and prognostic group according to the initial blood tests. Patients were reviewed in clinic and a joint decision between patient, oncologist, and urologist was made on whether to proceed with CN prior to TT.

Data for this study included receipt of CN, age, gender, metastatic sites, type of medical therapies and prognostic criteria at the time of diagnosis, i.e., World Health Organization (WHO) PS, albumin-corrected serum calcium, haemoglobin, neutrophil count, platelet count, and time from diagnosis to targeted therapy [12]. WHO PS was then converted to the categorical Karnofsky PS (KPS; i.e., KPS ≥ 80% for WHO PS 0 or 1, KPS < 80% for WHO PS 2–4). Laboratory tests were performed in the Addenbrooke’s Hospital laboratory and values were standardised against the upper limit or lower limit of normal, as appropriate [13].

Statistical analyses

Temporal trends in the practice pattern of CN were evaluated using a piecewise regression approach that is implemented in the Joinpoint Regression Program (Version 4.1, National Cancer Institute, Bethesda, MD, United States). In this approach, the annual frequency of CN was modelled using a linear segmented regression function, with a log-transformed dependent variable. Changes in temporal trend of the use of CN are reported as percentage change.

IPTW-adjusted analyses were performed to account for differences in baseline characteristics between groups and thus for selection bias, as popularised by Seisen et al. [11]. In this method, each patient was weighted by the inverse probability of being in the CN versus NCN group, with the goal of balancing observed characteristics between the two groups. The probability (or propensity) of being in the two treatment groups was estimated from a logistic regression model that included variables that potentially impacted receipt of CN, i.e., age, presence of lung metastasis, liver metastasis, bone metastasis, brain metastasis, lymph node metastasis, number of metastatic sites, histological subtype, KPS, anaemia, neutrophilia, hypercalcemia, thrombocytosis, and year of diagnosis. Baseline characteristics were compared between groups pre- and post-weighting using the standardised differences approach, as opposed to t tests and χ 2 tests. In this quantitative method, significant imbalances in covariates are present if the standardised difference is ≥ 0.1 (i.e. ≥ 10%).

The primary study endpoint was OS, which was calculated from the date of diagnosis to death or last follow-up. IPTW-adjusted survivor functions were estimated with the Kaplan–Meier method, and overall mortality was compared between groups using Cox proportional hazards regression models and IPTW-adjusted log-rank tests. The proportional hazards assumption was tested with Schoenfeld tests and complementary log–log plots, and demonstrated that this assumption was not violated in our models. Exploratory analyses were performed to determine the heterogeneity of the CN effect according to baseline variables by testing interaction terms within the IPTW-adjusted Cox models.

In view of the fact that IPTW-weighting balances only measured confounders between groups, we performed sensitivity analyses without assumptions to assess the impact of unmeasured confounders [14]. According to the approach described by Ding and VanderWheele [14], magnitudes of the joint bounding factor were estimated for various combinations of the odds of receiving CN in the (ORCN-U) and the hazard of overall mortality (HROM-U) both in the presence of unmeasured confounders. All statistical analyses were performed with R 3.4.0 (The R Foundation for Statistical Computing, Vienna, Austria). Statistical significance was set at 0.05.

Results

Baseline characteristics

The overall cohort included 261 patients with synchronous mRCC, of whom 97 (37.2%) underwent CN and 164 (62.8%) did not. The proportion of patients who underwent CN decreased over the years, with a biannual change of − 11.5% (95% CI − 19.7 to − 2.5, P < 0.001, Fig. 1). The most common first-line TT were sunitinib (N = 158, 60.5%) and pazopanib (N = 74, 28.4%). One hundred and twelve patients (42.9%) received advanced line treatments (> first line), of which 22 (8.4%) included cabozantinib or nivolumab.

Fig. 1
figure 1

Temporal trends in the utilisation of cytoreductive nephrectomy in 261 patients with synchronous metastatic RCC treated at Cambridge University Hospitals. There was a significant decline in the use of cytoreductive nephrectomy between 2006/2007 and 2016/2017 by 11.5% every 2 years (95% CI − 19.7 to − 2.5, P < 0.001)

Unweighted and weighted baseline characteristics are presented in Table 1. In unweighted comparisons, both groups differed with respect to all analysed baseline variables except gender and presence of lung metastases. After IPTW adjustment, all standardised differences were < 0.1, indicating that both groups were then comparable.

Table 1 Baseline characteristics for 261 patients who received targeted therapies with (CN) or without cytoreductive nephrectomy (NCN) for synchronous metastatic renal cell carcinoma in the unweighted and weighted cohort of the Cambridge Oncology Registry

Overall survival

There were 206 deaths (78.9%) during follow-up. The median follow-up for patients surviving was 14.6 months (IQR 7.1–24.3). In unadjusted analyses, overall mortality was reduced in relative terms by 54% in the CN group (HR 0.46, 95% CI 0.34–0.62, P < 0.001), with a median OS time of 25.6 months (95% CI 23.3–32.1) versus 12.4 months (95% CI 10.3–15.0, Fig. 2a). In IPTW-adjusted analyses, the OS difference was smaller but still statistically significant (HR 0.63, 95% CI 0.46–0.83, P = 0.0015), with a median OS time of 20.9 months (95% CI 18.5–29.6) versus 12.6 months (95% CI 11.4–15.2) (Fig. 2b). As demonstrated in the IPTW-adjusted Kaplan–Meier plot (Fig. 2b), there was no impact of CN on OS probabilities at 3 months (CN versus NCN: 95.3 versus 95.2%, P = 0.97), at 6 months (84.6 versus 81.2%, P = 0.67), and 9 months (71.3 versus 67.9%, P = 0.70). A clinically relevant OS benefit in favour of the CN group first appeared after 12 months (65.9 versus 51.9%, P = 0.11) and was statistically significant at 18 months (59.2 versus 34.0%, P = 0.005) and 24 months (44.2 versus 21.8%, P = 0.004). The 3-month landmark IPTW-adjusted analysis demonstrated a little impact of immortal time bias on treatment effect (HR 0.64, 95% CI 0.48–0.86, P = 0.004). At adjusted 12-month landmark analysis that considered only patients alive at that landmark point, CN was associated with a 44% decreased relative risk of death (HR 0.56, 95% CI 0.37–0.85, P = 0.006).

Fig. 2
figure 2

Unadjusted (a) and inverse probability of treatment weighting-adjusted (b) Kaplan–Meier estimates of overall survival in patients who underwent cytoreductive nephrectomy versus no cytoreductive nephrectomy for synchronous metastatic renal cell carcinoma

Using interaction term analyses, we tested whether type of treatment (CN versus NCN) interacted with baseline predictors of overall mortality. IPTW-adjusted HR is presented in Fig. 3. In these analyses, the beneficial effect of CN increased in patients with better KPS (P = 0.06), in women (P = 0.03), and in patients with thrombocytosis (P = 0.01). The effect of CN did not differ according to the type of TT (P = 0.47).

Fig. 3
figure 3

Forest plot depicting inverse probability of treatment weighting-adjusted hazard ratios of overall mortality of cytoreductive nephrectomy versus no cytoreductive nephrectomy according to baseline clinical variables. Due to small numbers, the subgroups of patients with brain metastasis and an interval to TT > 12 months were not analysed

Sensitivity analysis

Magnitudes of the joint bounding factor for different combinations of the treatment–confounder association ORCN-U and the mortality–confounder association HROM-U are shown in Table 2. For insignificance (yellow) or the opposite effect of CN (red), ORCN-U and HROM-U would need to meet specific estimates. The odds of receiving CN in the presence of a given unmeasured confounder (e.g., small volume metastatic disease) would need to increase, while the overall mortality would need to decrease in the presence of the same confounder. In a second set of sensitivity analysis, the cohort was restricted to patients with clear cell subtype. Propensity scores were re-estimated for this subset, and the final models showed comparable results to the initial analysis without altering any conclusion. In final set of sensitivity analysis, we fitted a multivariable Cox model with baseline and treatment variables, including the effect of advanced line treatment. In this analysis, the beneficial effect of CN was confirmed (HR 0.68, P = 0.043, Table 3).

Table 2 Magnitudes of the joint bounding factor for various combinations of the odds of receiving cytoreductive nephrectomy and the hazard of overall mortality in the presence of unmeasured confounders
Table 3 Multivariable Cox proportional hazards model predicting overall mortality

Discussion

In this retrospective study, we evaluated the prognostic effect of CN in patients with synchronous mRCC that subsequently received TT. In our cohort, CN had a statistically significant effect on overall mortality. Patients with a good KPS, women, and those with thrombocytosis may benefit from CN and could represent the target population for future randomised trials.

The observed benefit in overall mortality for patients treated with CN is in line with other retrospective studies [8, 9], although our HR was slightly more conservative. Recently, Petrelli et al. [15] attempted a meta-analysis of retrospective studies and reported a pooled HR of 0.46. However, as data quality was generally limited in these non-randomised observational studies, data from the pooled analysis were of limited quality as well. Further analyses from our study suggest that the beneficial effect of CN on mortality increases with increasing incremental survival time. Indeed, we did not find a difference in OS in the first 12 months of survivorship. Comparably, in adjusted analyses by Heng et al. [9], there was no statistically significant difference for patients who lived < 3 months, 6 and 12 months; whereas there was a survival advantage for those patients estimated to survive longer. In contrast, in a large retrospective study from the National Cancer Database [8], statistical significance was obtained earlier than 12 months, but the absolute benefit was only 0.7–1.8 months. Furthermore, this study included almost 13,000 patients, which enabled the detection of very small statistically significant differences that were not present in smaller studies. Despite providing the largest sample size to date, this study did not account for performance status, laboratory data, IMDC prognostic group, or type of TT, as these data were not available in this registry.

This, however, does not mean that patients with an estimated survival of < 12 months do not benefit from a CN. First, CN has clear role in symptom palliation, i.e., in patients with intractable pain, bleeding, uncontrolled hypertension, and symptoms due to paraneoplastic syndromes. Second, our interaction term analysis demonstrated an oncological benefit for certain subgroups, such as patients with thrombocytosis. Although generally considered as a group with dismal prognosis [13], CN may improve OS in this subgroup, even though it may still be less than 12 months. The absolute benefit is minor and has to be balanced against the period of hospitalization and postoperative recovery. Furthermore, with the low number of patients and the fairly large confidence interval, this finding has to be treated with caution. It is possible that it represents a statistical artefact rather than a true clinically relevant association. Similar conclusions can be drawn for gender and KPS, which represent the other two variables showing a clinically relevant interaction with CN. KPS, however, was previously identified as a predictor of outcomes. In a study by Heng et al. [9], the HR for CN was 0.53 in patients with KPS > 80 and 0.70 for patients with KPS < 80, although no testing for heterogeneity or interaction was performed. KPS may, therefore, be a critical factor that determines the effect of CN, but further validation studies are necessary.

Propensity score methods are often used to remove the effects of measured confounders in observational studies, including propensity score matching, covariate adjustment using propensity scores, and IPTW. IPTW creates a synthetic sample in which the distribution of measured baseline covariates is independent of treatment assignment [16] and allows relatively unbiased estimates of average treatment effects [17]. Of note, the design phase of the study is separated from the analysis phase, which is more comparable to randomised experiments [18]. IPTW serves to weight all groups up to the full sample [19], which is certainly an advantage over the traditional matched pairs’ analyses. In urological cancer research, IPTW-adjusted analyses have been popularised by Seisen et al. [11]. Using a similar approach, we were able to calculate a HR of 0.63 (95% CI 0.46–0.83) in favour of CN, which provided us with a more conservative estimate than data from unweighted analyses (HR 0.46, 95% CI 0.34–0.62). We contend that in the setting of mRCC with many possible confounders, a more conservative approach is preferable.

As propensity scores can only adjust for measured confounders, unmeasured confounding must be addressed using distinct sensitivity analyses. Groups may differ with regard to these unmeasured confounders, which may subsequently impact the outcome measure [20]. We used sensitivity analyses according to Ding and VanderWheele, which has the advantage of making no assumptions about the structure of the unmeasured confounder and to provide stronger conclusions than the traditional Cornfield conditions [14]. We show that, under certain circumstances, an unmeasured confounder would render the effect of CN insignificant, if there are imbalances in the odds for receiving a CN (yellow and red areas in Table 2).

The role of CN in the era of immuno-oncology is unclear at present. In the CheckMate 025 trial, more than 85% of patients had prior CN, but the impact of CN on outcomes was not analysed [21]. Cytoreductive surgery may be an important cornerstone in the multidisciplinary management of these patients, comparable to patients who received cytokines. The current study showed an OS benefit if patients received nivolumab or cabozantinib in the advanced line setting (Table 3). Furthermore, the analysis reinforced the concept that there are multiple baseline (liver and node metastasis, thrombocytosis, and KPS) and treatment characteristics (advanced line treatment) that dictate OS, and CN may be one of them. Due to low numbers of patients, it was not possible to analyse the role of CN in patients receiving nivolumab. As a first step, an analysis of clinical trial data of CheckMate 025 may provide a hypothesis, and IPTW adjustment may be used for this purpose.

In randomised controlled trials, the survival time is generally calculated from the date of randomisation, which is not available in retrospective observational studies. Interestingly, there has been no consensus on how to define this starting point. While some groups used the time from the initiation of the first-line targeted therapy [9, 22], Hanna et al. [8] employed the date of diagnosis, and others did not specify the exact method. Using the date from starting the first-line treatment gives an artificial survival advantage of weeks to months to the group of patients who did not undergo CN, as the period of nephrectomy and subsequent recovery is not included. Therefore, the authors of the present manuscript used the time of diagnosis, which may provide a better estimate of patient survival and may be more close to the date of randomisation.

The current study is limited by a single-institution experience and its retrospective design. Subgroup analyses were limited by sample size. For example, in contrast to clear cell mRCC, the 95% CI of the HR for non-clear cell included 1.00. However, the P value for the interaction test being insignificant, which could be related to statistical power problems. Larger studies focusing on non-clear cell mRCC should address the question if CN is beneficial in this subgroup. As this was an analysis of an oncology database, the overall number of patients diagnosed with mRCC, and the number of patients who received CN and then did not receive TT because of CN-related morbidity, progressive disease or other reasons is unknown. Specifically, CN-related morbidity may be substantial, but it was not possible to account for that. In the British Association of Urological Surgeons nephrectomy audit, the 30-day mortality was 1.8, 24.1% required perioperative blood transfusion, and 8% had postoperative complications of Clavien–Dindo grade ≥ III [23]. Finally, selection bias and unmeasured differences between groups are a major problem in every retrospective study dealing with complex diseases such as mRCC. Several baseline parameters such as primary tumour volume (i.e., measured by size and local T stage on imaging) are important parameters which likely affected management but were not analysed, as no imaging review was performed. We attempted to address selection bias by performing IPTW-adjusted analyses according to many baseline variables and unmeasured confounding by sensitivity analysis. Nonetheless, residual unmeasured confounding may have impacted the effect of CN. We recognise the limitations of both statistical approaches and that they cannot replace randomisation.

Conclusions

IPTW-adjusted analysis of our patient cohort suggests that CN improves OS of patients with synchronous mRCC treated with TT. On the whole, the survival difference appears after 12 months. Specific subgroups may benefit from CN, and these subgroups warrant further investigation in prospective trials.