Introduction

Renal cell carcinoma (RCC) accounts for 2–3 % of all adult malignancies [1]. It is considered to be the seventh most common cancer in men and the ninth most common cancer in women [1]. In the United States, there are approximately 65,000 new cases reported and almost 14,000 deaths from RCC per annum [2]. Recently, the proportion of small and incidental renal tumors has increased significantly mainly due to the increased availability of imaging modalities, such as ultrasonography (US), computed tomography (CT), and magnetic resonance imaging (MRI). It has been estimated that more than 50 % of RCCs are being reported as incidental findings [3].

The detection of an incidental solid renal mass for many physicians is a “clinical puzzlement” and treatment options remain controversial. For T1 renal tumors (<7 cm), partial nephrectomy or other “nephron-sparing surgery” is recommended as the preferred option by many physicians. Partial nephrectomy can be performed either via open, laparoscopic or coelioscopic robot-assisted approaches [4]. New minimally invasive ablative techniques have been introduced as alternative measures to treat small renal tumors. Ablative therapies comprise cryoablation, radiofrequency ablation (RFA) and microwave ablation (MWA) that can be performed through open incisions or via laparoscopic or percutaneous routes under image guidance (US, MRI, CT) [57].

The presumed advantages of thermal ablation techniques compared to surgical resection (excision) are their minimally invasive nature and greater safety. Nonetheless, it is still debatable whether they can achieve equivalent local tumor control and long-term patient survival and therefore surgery remains the “gold standard” oncological therapy [79]. Unfortunately, previous guidelines and recommendations have been based on plain evidence synthesis of aggregate data of mostly single-arm studies [10]. We performed a systematic review of the literature and a quantitative data synthesis to compare all relevant studies referring to surgical versus thermal ablative techniques for the treatment of small renal tumors (T1 stage).

Materials and Methods

Study Selection Strategy, Inclusion Criteria, and Risk of Bias

There were no restrictions on publication language, publication date, or publication status. The strategies used to identify all published studies comparing surgical to ablative techniques for treating renal tumours included electronic searches of PubMed (Medline), Excerpta Medical Database (EMBASE), Scopus, and AMED. The search applied Boolean syntax (i.e., the logic terms AND and/or OR) to include combinations of the following medical subject heading terms (MeSH) and text words: “RFA”; “radio frequency ablation”; cryoablation”; “cryotherapy”; “MWA”; “renal cell tumor”; “RCC”; “kidney tumor”; “renal tumor”; “renal neoplasm”; “renal cancer”; “kidney cancer”; “renal mass”; “nephrectomy”; “renal surgery”; “nephron-sparing surgery”; “partial nephrectomy”; “recurrence”; “progression”; “metastasis”; “metastases”; “complications”; “renal function”; “kidney function”; and “disease-free survival (DFS).”

Furthermore, the search was broadened by cross-checking of the reference lists of the retrieved articles. All relevant papers also were interrogated. The literature research was last updated in August 2013. Each study was evaluated for inclusion in the meta-analysis on the basis of the following criteria: (1) only cohort studies of adequate quality based on the Newcastle-Ottawa Scale [11] were considered for inclusion; (2) the target population included patients with documented T1 stage renal tumors; (3) all types of thermal ablative and surgical methods for treating renal tumors were eligible; and (4) clinical and imaging follow-up was available for at least 1 year. The study review of a cohort clinical trial (nonrandomized, controlled study) is comprised of seven different steps: (1) clearly formulated question; (2) comprehensive data search; (3) unbiased selection and abstraction process; (4) critical appraisal of data; (5) synthesis of data; (6) perform sensitivity and subgroup analyses if appropriate and possible; and (7) prepare a structured report. In addition, the star-based Newcastle-Ottawa Scale (NOS) was employed to score the quality of each cohort study [11]. Quality assessment included three broad domains: (1) selection (up to 4 stars), comparability (up to 2 stars), and outcome (up to 3 stars). The maximum score that can be assigned by NOS is 9 stars. Disagreements were resolved by consensus. The overall evaluation of included trials is presented in Table 1.

Table 1 Newcastle-Ottawa Scale for quality assessment of cohort studies

Data Extraction and Outcome Measures

The trial selection process complied with the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement [12]. The reference lists of all retrieved articles were rigorously assessed for inclusion suitability. Three of the authors (K.K., L.M., and M.K.) designed the systematic review, individually selected the trials to be included in this meta-analysis and independently extracted all presented data. Descriptive data extracted from each trial included a number of baseline demographics, procedural variables, follow-up, and primary and secondary endpoints in each treatment group. Data were extracted from the main text, survival curves, and tables of published manuscripts. Again, any disagreements were resolved by consensus between the investigators. Outcome measures of this systematic review were defined according to previously published International guidelines [13]. The primary outcome measure of this meta-analysis was DFS as assessed by imaging in each individual study. DFS was defined as survival without any evidence of local relapse or remote metastatic disease. Secondary outcome measures included the overall and major complication rates, early repeat treatment (because of incomplete therapy at index procedure), the rate of confirmed RCC on biopsy, the rate of local recurrence, and the decline of estimated glomerular filtration rate (eGFR) postprocedure.

Statistical Methods

Quantitative data synthesis of the included RCTs was performed with the open-source cross-platform OpenMeta[Analyst] software (Brown University, Rhode Island, US; available at http://www.cebm.brown.edu/software) and the Statsdirect statistical package (Version 2.7.9, Statsdirect Ltd, Cheshire, United Kingdom). Categorical variables were expressed as counts (percentages) and continuous variables as means ± standard deviation if normally distributed. The primary endpoint (DFS) was calculated on the log-hazard scale and expressed as hazard ratio (HR) as recommended for time-to-event outcomes [14]. Study-specific HRs and respective variances were retrieved from individual publications or calculated from available data and quoted log-rank statistics with the equations of Parmar et al. [14, 15]. Secondary endpoints were expressed as risk ratios (RR) apart from the postprocedure eGFR decline reported as the mean difference (ml/min/1.73 m2). Summary estimates are presented with the associated 95 % confidence intervals (CIs). The random DerSimonian and Laird (D–L) effects model was applied to calculate all pooled summary estimates in order to account for clinical heterogeneity of study effects. In case of zero cells, we applied continuity correction with addition of a correction factor of 0.5 in all cells.

The Cochran’s Q test and the I 2 statistic were calculated to test for statistical evidence of heterogeneity across the studies. Briefly, I 2 values < 25 % indicate low, 25–50 % moderate, and >50 % high heterogeneity [16]. Potential publication bias was assessed by visual inspection of inverted funnel plot asymmetry as previously recommended for meta-analyses, including a small number of studies. Funnel plots are plots of the trials’ effect estimates against respective sample sizes and in the presence of publication or other bias they may appear to be skewed and asymmetrical [17]. The Horbold–Egger test also was used to indicate publication bias in case of subjective funnel plot evaluation. To evaluate the stability of our results, sensitivity analyses were undertaken by using fixed Mantel–Haenszel (M–H) effects models or random D–L effects models and omission of one study at a time (leave-one-out) to look for influence of any individual dataset on the pooled endpoint estimates. Meta-regression analysis was employed to explore clinical heterogeneity including year of publication, patient age, tumour size (cm), duration of follow-up, and baseline eGFR as covariates. The level of statistical significance was set at α = 0.05.

Results

Study Selection and Description

The title and the abstract of 302 scientific records were screened for potential inclusion in this systematic review (PRISMA flowchart; Fig. 1). Of those, 289 citations were found to be irrelevant, incomplete, or duplicate and were excluded from further analysis. Therefore, 13 studies were found to be eligible and their full-text publications were analyzed. Of those, seven full-text articles were excluded, because they did not meet the predefined inclusion criteria (2 reported only renal function outcomes, 2 included a cryoablation arm, 1 reported only a cost-effectiveness analysis, 1 had no long-term oncologic outcomes, and 1 was a single-surgeon unmatched comparison with a high risk of bias). Finally, six cohort studies were included in this systematic qualitative review and quantitative data synthesis [1823]. All studies investigated the application of thermal (RFA or MW) ablation versus surgical nephrectomy. Of note, other novel thermal treatments, such as high intensity focused ultrasound (HIFU), radiosurgery, laser interstitial thermal therapy (LITT), and pulsed cavitational ultrasound (PCU), have been described in the treatment of RCCs but reported outcomes were too limited to be included in the present analysis.

Fig. 1
figure 1

Trial selection process according to the PRISMA statement by the Cochrane Collaboration

All of them were well-designed cohort studies with a control group and were of moderate to high quality according to the NOS assessment (6–8 stars out of 9; Table 1). Included studies comprised 587 enrolled patients in total, and their design and baseline characteristics are outlined in detail in Tables 2 and 3. Only one of them was a randomized, controlled trial (RCT) [19]. Quantitative data synthesis involved 355 patients treated with open or laparoscopic nephrectomy versus 252 cases treated with percutaneous or laparoscopic thermal ablation with available follow-up up to 5 years. Average tumour size was 2.5 cm in both groups with a negligible weighted mean difference (−0.1 cm, 95 % CI: −0.38 to 0.17; random-effects model).

Table 2 Study design of included cohort studies
Table 3 Baseline characteristics of included cohort studies

Primary and Secondary Endpoints

Data on events of disease recurrence and deaths and/or DFS survival curves were available in all studies. DFS was similar between the two treatment options (pooled HR: 1.04, 95 % CI: 0.48–2.25, p = 0.92; Fig. 2). There was low statistical heterogeneity among included trials (χ 2 = 3.21, I 2 = 0 %, p = 0.68). There was no visual asymmetry of the respective funnel plot to suggest publication bias (bias = 1.17, p = 0.41; Fig. 3).

Fig. 2
figure 2

Funnel plot of disease-free survival. The SE of the logHR was plotted against the HR (hazard ratio) for each trial. Note that there is no visual asymmetry to suggest publication bias

Fig. 3
figure 3

Random effects forest plot of pooled estimates of disease-free survival. Estimates are reported as the hazard ratio and 95 % confidence intervals

Overall and major complication events were reported in five of six studies. The overall rate of complications was significantly less in case of thermal ablation compared with surgical nephrectomy (7.4 vs. 11.1 %; pooled RR: 0.55, 95 % CI: 0.31–0.97, p = 0.04; Fig. 4). There was low statistical heterogeneity among included trials (χ 2 = 3.89, I 2 = 0 %, p = 0.42) and no significant publication bias (bias = 0.63, p = 0.58). Major complications also were numerically fewer in the arm of thermal ablation (2.3 vs. 5 %; pooled RR: 0.46, 95 % CI: 0.15–1.4, p = 0.17; Fig. 5) with low statistical heterogeneity among trials (χ 2 = 4.22, I 2 = 5 %, p = 0.38) and no significant publication bias (bias = 0.14, p = 0.95).

Fig. 4
figure 4

Random effects forest plot of pooled estimates of all complications. Estimates are reported as the risk ratio and 95 % confidence intervals

Fig. 5
figure 5

Random effects forest plot of pooled estimates of major complications. Estimates are reported as the risk ratio and 95 % confidence intervals

The number of repeat ablation events was mentioned explicitly in four of the six studies, and there were no cases of repeat surgical nephrectomy reported. Hence, the need for repeat treatment was significantly higher in case of thermal ablation (7.2 vs. 0 %; pooled RR: 8.1, 95 % CI: 1.8–36.3, p = 0.006; Fig. 6). There was low statistical heterogeneity among included trials (χ 2 = 1.9, I 2 = 0 %, p = 0.61) and some publication bias (bias = − 2.1, p = 0.08).

Fig. 6
figure 6

Random effects forest plot of pooled estimates of repeat treatment. Estimates are reported as the risk ratio and 95 % confidence intervals

The rates of biopsy-confirmed RCC were reported in five of six studies and were comparable between the two treatment options (82.1 vs. 84.4 %; pooled RR: 0.99, 95 % CI: 0.97–1.03, p = 0.79; Fig. 7). There was low statistical heterogeneity among included trials (χ 2 = 3.07, I 2 = 0 %, p = 0.44) and no significant publication bias (bias = 4.86, p = 0.22).

Fig. 7
figure 7

Random effects forest plot of pooled estimates of confirmed renal cell carcinoma. Estimates are reported as the risk ratio and 95 % confidence intervals

Data on future local recurrence of RCC were available from all studies and event rates were similar between the two treatment methods (3.6 vs. 3.6 %; pooled RR: 0.92, 95 % CI: 0.4–2.14, p = 0.79; Fig. 8). There was low statistical heterogeneity among included trials (χ 2 = 0.65, I 2 = 0 %, p = 0.99) and no significant publication bias (bias = 0.37, p = 0.8).

Fig. 8
figure 8

Random effects forest plot of pooled estimates of local recurrence. Estimates are reported as the risk ratio and 95 % confidence intervals

Finally, preoperative and postoperative renal function was reported in three studies. Reduction of eGFR following the index procedure was significantly higher with surgical nephrectomy compared with nephron-sparing thermal ablation (mean difference of eGFR decline −14.6, 95 % CI: −27.96 to −1.23, p = 0.03; Fig. 9). There was high statistical heterogeneity among those three trials (χ 2 = 27.8, I 2 = 93 %, p < 0.001), but the strata were too few to calculate publication bias. A tabulated summary of all endpoints with their pooled HRs and RRs (95 % CIs) is presented in Table 4.

Fig. 9
figure 9

Random-effects forest plot of pooled estimates of eGFR decline. Estimates are reported as the mean periprocedural difference and 95 % confidence intervals

Table 4 Summary of pooled outcome measures

Sensitivity Analysis

In the sensitivity analysis, we first applied the fixed M–H effects model versus the random D–L versus. Pooled outcome estimates were largely similar, apart from the major complication rates that proved to be statistically significantly less with thermal ablation (2.3 vs. 5 %; pooled RR: 0.34, 95 % CI: 0.12–0.97, p = 0.044) using the fixed-effects model. Subgroup analyses to compare radiofrequency versus microwave or percutaneous versus laparoscopic techniques were not possible because of the very small number of studies. Leave-one-out study omission was also performed to identify individual datasets with significant impact on the pooled endpoint estimates (Fig. 10). Pooled HR and RRs of all outcome measures were not dependent on any particular cohort study with the exception of the complication rates. In the latter case, significance of the pooled result was dependent on the inclusion of every individual dataset apart from the study by Stern et al. [21]. This finding implies that the analysis is affected by the relatively small number of events and small sample size of all included cohort studies.

Fig. 10
figure 10

Leave-one-out sensitivity analysis to identify individual datasets with significant impact on the pooled endpoint estimates. Selected forest plots (disease-free survival and complication rates) shown only

Meta-regression Analysis

Meta-regression analysis was performed again with a random D–L effects model. The overall rate of complications was significantly influenced by the size of the treated tumour (coefficient: −2.8, 95 % CI: −5.5 to −0.1, p = 0.04) and by the year of publication (coefficient: −0.35, 95 % CI: −0.7 to −0, p = 0.05). The larger the tumour and the more recent the publication, the more the RR shifted in favour of thermal ablation. In addition, perioperative decline of renal function was dependent on baseline eGFR (coefficient: 0.71, 95 % CI: 0.45–0.98, p < 0.001), i.e., the mean difference of eGFR decline increased in favour of thermal ablation as the baseline renal function worsened. The rest of examined baseline covariates were associated with relatively weak regression slopes and no correlation coefficient proved to be statistically significant. Significant results of the meta-regression analysis are shown in Fig. 11.

Fig. 11
figure 11

Meta-regression plots on the log scale of the estimate. Only statistically significant regression plots are shown

Discussion

Surgical resection is considered the “gold standard” for the treatment of small RCCs [3]. However, traditional surgical procedures are being challenged lately by a number of minimal ablative techniques, such as RFA, MWA, and cryoablation, all of which have demonstrated promising results in the treatment of RCC [2428]. Results of large series of renal tumours treated with percutaneous RF ablation [26, 27, 29, 30] and MWA [24, 25] have demonstrated that ablative techniques are effective treatments with acceptable short- to intermediate-term effectiveness and are associated with a generally low risk of complications [31]. Although there are several articles in the literature with long-term follow-up that support the application of thermal ablative techniques, there is scarcity of RCT and/or high-quality cohort studies that compare surgery and thermal ablation head-to-head. During our systematic review of the literature, we failed to identify any RCT to report a clear advantage of one or the other therapeutic method. Thus, we conducted a thorough literature research and quantitative data synthesis of the existing high-quality cohort studies on the subject, including only a single RCT available [19].

In total, six cohort studies (1 RCT included) with at least 1-year clinical and imaging follow-up (up to 6 years) and 587 subjects in total were analyzed. To critically appraise the present systematic review, it is important to consider possible sources of heterogeneity among the included cohort studies. Treatments compared included surgical nephrectomy (open or laparoscopic) versus RFA in five studies and MWA in one study [19]. Of note, ablation was applied either in a percutaneous or a laparoscopic way. There also were minor variations regarding the primary and secondary end points, with all studies providing data on procedural and clinical outcomes, except Olweny et al. [20] who failed to report any complications. In addition, Bird et al. [18], Olweny et al. [20], and Stern et al. [21] did not report any data on periprocedural changes of renal function.

Otherwise, the six included studies proved quite homogeneous in their study design, primary outcome measures, and baseline demographic and procedural variables. Quality assessment scaling also was employed to account for the heterogeneity on selection, comparability, and outcome measures. Pooled results proved to be robust during sensitivity analysis (M–H vs. D–L models and leave-one-out testing), and there was minimal publication bias. Pooled effects estimates showed that DFS rates were largely similar between the two methods. On the other hand, the complication rates reported, as well as the e-GFR decline, was in favour of the thermal treatments. Complication rates were almost halved with the use of thermal ablation, but the need for a second treatment because of residual disease was up to eight times higher compared with surgical nephrectomy. Therefore, a parameter that needs to be considered is that successful ablation might require more than one procedure to achieve complete tumour necrosis. Overall, those results are in line with previously reported findings from isolated studies and trials and document the clinical effectiveness of thermal ablation with a higher level of evidence. Of further interest, one in seven cases in all groups tested negative on baseline biopsy for RCC, which also correlates with other systematic reviews [9].

Previously published, large-volume, population-based cohorts of ablation versus nephrectomy for RCC have highlighted the fact that patients offered ablation are usually older and suffer from multiple comorbidities. Not surprisingly, those reports have produced contradictory oncologic outcomes [5, 8, 9]. Briefly, a United States population-based analysis of nephron-sparing surgery versus cryo- or thermal ablation for small renal masses in more than 8,000 cases showed an almost twofold higher adjusted risk of kidney cancer death in case of ablation [5]. However, only approximately 200 of them were treated with RFA and histological data was missing in more than 50 % of ablated cases. Another systematic review and cumulative analysis of 98 observational studies with more than 6,000 renal tumours treated with cryoablation or nephrectomy has identified an almost tenfold increase of major complications with surgery and a fivefold increase of local disease progression with cryoablation [9]. However, the validity of the results was limited again by significant clinical heterogeneity affecting adversely the ablation group. Finally, an even larger report on U.S. national practice trends for treatment of Stage I RCC outlined the results of >15,000 patients of which approximately 500 had been treated with thermal or cryoablation. The authors reported that after multivariable adjustment for confounding factors of baseline bias there was no statistical difference in cancer-specific or overall survival between ablation and radical or partial nephrectomy [8].

The present overview and meta-analysis mainly suffers from lack of RCTs and the relatively small number of included participants. However, we have scrutinized the literature and carefully selected only high-quality cohort studies (including a single RCT). Of note, participant characteristics were generally well matched within studies and well balanced in-between studies and without any particular evidence of clinical or statistical heterogeneity. Nonetheless, the size of study groups was too limited to support extensive subgroup analyses and more advanced meta-regression techniques for a more elaborate synthesis of the original datasets. In addition, other oncologic endpoints, such as overall survival, were not routinely reported and relevant pooled estimates could not be calculated.

In conclusion, there is significant evidence to support the application of thermal ablation for the treatment of small RCC. Thermal ablation has been shown to provide long-term oncologic outcomes similar to surgery but with a reduced rate of complications and limited decline of renal function. The present report is at least hypothesis-generating, and there is a clear mandate for well-designed, large-scale, RCT to provide higher quality scientific evidence on the matter.