Introduction

Radical nephrectomy (RN) or nephron-sparing surgery remains the mainstay of therapy to achieve cure in patients with localized renal cell carcinoma (RCC) [1]. Despite surgical treatment, a non-negligible fraction of patients with localized RCC experience disease recurrence [2]. Among other factors, lymph node metastasis (LNM) has been demonstrated to adversely impact oncologic outcomes in patients with RCC [3, 4]. As the presence of LNM has implications for follow-up scheduling and potentially adjuvant treatment with targeted therapies, accurate nodal staging appears necessary.

Currently available imaging technologies are limited by a detection threshold for LNM; thus, lymph node dissection (LND) remains the most accurate form of nodal staging in patients with localized RCC [5]. However, no clear consensus exists regarding the indications and the extent of LND at RN [5]. Indeed, the anatomic unpredictability of lymphatic outflow in RCC and the early manifestation of distant metastases without LNM have hampered efforts to define a uniformly accepted extent of LND in RCC [6,7,8]. In recent years, efforts have been made to estimate the presence of LNM in patients with RCC and the number of LNs necessary to achieve adequate nodal staging [9,10,11].

We developed a methodology (pathological nodal staging score, pNSS) to calculate the probability that a patient with pathologic node-negative status at surgery does in fact not have LNM, using the number of examined LNs and established clinicopathologic features of a given tumor type [12,13,14,15]. The aim of the current study, then, was to develop and externally validate such a prognostic model for patients with clear cell renal cell carcinoma (cRCC) treated with RN. Toward this aim, we developed the pNSS using a multicenter cohort of patients who underwent RN and LND for cRCC. For validation, we used a contemporary, population-based cohort by accessing the Surveillance, Epidemiology and End Results (SEER) database. Use of SEER allowed us to test whether the pNSS developed in multicenter individual dataset is generalizable and reproducible in a population-based dataset.

Materials and methods

Patient selection and data collection

The development cohort comprised 1389 patients who underwent open (n = 1286, 92.6%) or laparoscopic (n = 103, 7.4%) RN and LND for clinically localized cRCC at five international academic centers between 1970 and 2012. The indications and extent of LND were at the surgeon’s discretion. No patient received preoperative radiotherapy, immunotherapy or targeted therapy. No patient had clinically evident distant metastatic disease at the time of RN. Institutional review boards approved the study, with all participating sites providing the necessary institutional data sharing agreements beforehand.

For the contemporary validation cohort (n = 2279), we used Surveillance, Epidemiology and End Results (SEER) registry data from 2004 to 2009. By the end of the study period, the registry captured approximately 28% of the US population, and is considered to be representative of the general population. Patients who underwent RN for kidney cancer (code C 64.9) were identified. Inclusion criteria consisted of those patients having a diagnosis of cRCC, and documentation of the number of LNs examined as well as the number of pathologically positive LNs. Patients were excluded from analyses when Fuhrman grade or tumor stage was unknown.

Pathological evaluation

All surgical specimens were processed according to standard pathological procedures as previously described [1]. Genitourinary pathologists determined tumor stage, which was in the development cohort reassigned according to the 2009 tumor node metastasis (TNM) classification [16]. In SEER, pathological tumor stage category was derived from collaborative staging data elements for all cases, consistent with the 2002 TNM classification [17]. To account for these different staging classifications in the development and validation cohort, pathological stage T1/2 and T3/4 patients were analyzed combined together. The Fuhrman classification was used for the assessment of nuclear grade [18]. Histologic subtypes were assigned according to the 2004 WHO classification [19].

Statistical analysis

Overview

We applied a methodology similar to that previously described [12,13,14,15], to build a pNSS each for the development and validation cohorts. The primary endpoint was the probability of incorrect nodal staging as a function of the number of examined LNs (n) [12,13,14,15]. Although true nodal status is unascertainable, the information from LN-positive patients can be used to determine if the number of examined LNs and those of negative LNs are sufficient to classify a patient as truly LN negative. For example, consider a patient with n large and k small, but positive (k = number of positive LNs from patients with LNM): if less than n LNs had been examined there would be a chance that this patient would have been incorrectly deemed LN negative. Conversely, for a patient with small n and large k, even with fewer examined LNs, it is unlikely that nodal disease would have been missed. Hence, the data from LN-positive patients are used to interpret the data for the LN-negative patients. The probability that a LN-negative patient has LNM can be computed using the following algorithm: compute the probability of missing a positive node (sensitivity), compute the prevalence of node-positive status, and compute the nodal staging score from sensitivity and prevalence [12,13,14,15].

Probability of missing a positive LN

The probability of missing a positive LN (one minus the sensitivity) in pN0 patients is inherent to the process of pathological detection and as such depends on the number of examined LNs but not on patient characteristics [12,13,14,15]. We used a β-binomial model for this purpose, allowing for heterogeneity in the intensity of nodal spread across the patients [12,13,14,15].

Estimation of prevalence of nodal disease

The observed prevalence (called apparent prevalence hereafter) is underestimated and needs to be adjusted for the false negatives [12,13,14,15]. This was done in two steps. The first step invokes assumption one and estimates #FNk as a function of k, which is the number of positive LNs from patients with LN involvement:

$$ {\# }{\text{FN}}_{k} \; = \;\frac{{\left[ {1\; - \;P\left( {{\text{FN}}_{k} } \right)} \right]\; \times \;{\# }{\text{TP}}_{k} }}{{P\left( {{\text{FN}}_{k} } \right)}}, $$

where #TPk is the number of true positives for a given k. Since prevalence is not a function of k, the second step obtains the adjusted prevalence by averaging over k:

$$ {\text{Prev}}\;{ = }\;\frac{{\sum\nolimits_{k} {\left( {{\text{FN}}_{k} \;{ + }\;{\text{TP}}_{k} } \right)} }}{{\sum\nolimits_{k} {\left( {{\text{FN}}_{k} \;{ + }\;{\text{TP}}_{k} \;{ + }\;{\text{TN}}_{k} } \right)} }}. $$

Estimation of prevalence is stratified by T stage for pNSS, but this is not explicitly noted in the above formula to avoid cumbersome notation.

Nodal staging score

Adequate staging was assessed by computing the NSS, the probability that a pathologically LN-negative patient is indeed free of nodal disease:

$$ {\text{NSS}}\;{ = }\;\frac{{ 1\; - \;{\text{Prev}}}}{{ 1\; - \;{\text{Prev}}\;{ + }\;\left[ {{\text{Prev}}\; \times \;P\left( {{\text{FN}}_{k} } \right)} \right]}}. $$

Confidence intervals

Precision of the reported estimates was assessed by creating 1000 bootstrap samples from the entire data set and replicating the estimation process. The 2.5th and the 97.5th quartiles were used as the lower and upper 95% confidence bounds for the corresponding estimates.

Validation of the development model

After development of the pNSS model in the data from the development cohort, we next compared these findings to those from the validation cohort. We compared the two populations using the Chi-square test to evaluate the association between categorical variables. Differences in variables with a continuous distribution across categories were assessed using the Kruskal–Wallis test. In addition, we compared the probabilities of missing a LN either based on the number of LNs removed/examined or combining the number of LNs removed/examined with pathological stage and Fuhrman grade. All statistical analyses were performed using SAS Version 9.2.

Results

Clinicopathologic characteristics in the development and validation cohorts

Table 1 shows the clinicopathologic characteristics of the development (n = 1389) and the validation cohorts (n = 2279). LNM was detected in 14.3% (n = 198) of the patients in the development cohort compared to 10.0% (n = 227) in the validation cohort (p < 0.001). The median number of LNs removed in patients with LNM was significantly higher in the development cohort compared to the validation cohort (7.0 vs. 2.0, p < 0.001).

Table 1 Clinicopathologic characteristics of 1389 patients in the development cohort and 2279 patients in the validation cohort who underwent radical nephrectomy and lymph node dissection for clinically localized clear cell renal cell carcinoma

Probability of missing a positive LN

Using our model, the beta-binomial parameters were estimated to be 1.01 (95% CI 0.72–1.29) and 0.53 (95% CI 0.34–0.72) in the validation cohort. We assessed the probability of LNM (one minus the sensitivity) as a function of the number of LNs examined (Fig. 1). In both the development and validation cohorts, the probability of missing LNM decreases with an increasing number of LNs examined (Fig. 1). When compared to the probabilities of missing a LN in the development cohort, fewer LNs were needed in the validation cohort to reach the same level of probability; however, these differences were not statistically significant (all p values > 0.05).

Fig. 1
figure 1

Probability of missing nodal disease as a function of nodes examined in 1389 patients in the development cohort and 2279 patients in the validation cohort who were treated with radical nephrectomy and lymphadenectomy for clinically localized clear cell renal cell carcinoma

Pathological nodal staging score

Figure 2 and Table 2 show the pNSS in the development and validation cohort. In patients with pT1/2 and Fuhrman grade I/II tumors, in both the development and validation cohorts, the examination of only one LN was sufficient to achieve a likelihood of more than 95% to predict correct pathologic nodal status. Meanwhile, three LNs were sufficient to achieve a likelihood of more than 95% to predict correct pathologic nodal status in patients with pT3/4 and Fuhrman grade I/II tumors. In contrast, the number of LNs examined to achieve a certain probability of being free from LNM needed to be higher in patients with Fuhrman grade III/IV tumors. Bootstrap CIs for all the estimates were within 1% (in absolute terms) of the estimates (data not shown). Significant differences in the probabilities between the development and validation cohorts were detected in patients with Fuhrman grade III/IV tumors (p < 0.001) (Fig. 2).

Fig. 2
figure 2

Pathologic nodal staging scores: sensitivity of the pathologic evaluation of nodal disease stratified by pathological tumor stage in combination with Fuhrman grade of 1389 patients in the development cohort and 2279 patients in the validation cohort who were treated with radical nephrectomy and lymphadenectomy for clinically localized clear cell renal cell carcinoma. The vertical axis is the probability of missing nodal disease (one minus sensitivity); the horizontal axis is the number of examined nodes

Table 2 Pathologic nodal staging score for selected values of number of nodes removed of 1389 patients in the development cohort and 2279 patients in the validation cohort who underwent radical nephrectomy and lymph node dissection for clinically localized clear cell renal cell carcinoma

Apparent and corrected prevalence of nodal disease

The apparent and corrected prevalences of nodal metastasis stratified by pathologic T stage and Fuhrman grade in both cohorts are reported in Table 3. Underestimation of prevalence due to false negatives was observed for all examined subgroups of patients. Statistically significant differences were found between the validation and the development cohorts with regard to apparent and corrected prevalences of nodal diseases in all cases (p < 0.001), pT1/2 and Fuhrman grade I/II (p < 0.001), pT1/2 and Fuhrman grade III/IV (p < 0.001), as well as pT3/4 and Fuhrman grade III/IV cases (p < 0.001).

Table 3 Apparent and corrected prevalences of lymph node metastasis in the original cohort of 1389 patients in the development cohort and 2279 patients in the validation cohort who underwent radical nephrectomy and lymph node dissection for clinically localized clear cell renal cell carcinoma

Discussion

The presence of LNM is a strong predictor of adverse outcomes in patients with cRCC undergoing RN [1, 3]. Indeed, RCC patients with regional LNM have a limited 5-year survival [5]. Accordingly, then, such patients with cRCC may be considered for adjuvant targeted therapies [20]. Knowledge of the lymph node status is important to allow proper risk estimation [5, 21,22,23] for counseling and follow-up scheduling as well as timely consideration of systemic therapy. Therefore, to estimate the probability that a cRCC patient with pathologic node-negative status at RN truly has no LNM, we developed and validated a pNSS. In both the development and the validation cohorts, the probability of missing LNM decreased with an increasing number of LNs examined. This is in line with previous studies, which aimed to identify a minimum number of LNs that need to be removed to obtain satisfactory nodal staging at time of RN [10]. In patients with RCC who underwent LND, a statistically significant correlation between the number of LNs removed and the percentage of nodal involvement could be shown [24, 25]. While the absolute number of LNs removed is important in estimating the probability of missing LNM, standard clinicopathologic features need to be taken into consideration as well.

We found that the number of LNs needed for appropriate nodal staging is associated with higher pathological tumor stage and Fuhrman grade. Our findings confirm previous studies showing that the proportion of patients having LNM increased proportionally with more aggressive disease [4]. Specifically, in a series of 1652 patients undergoing RN for cM0 cRCC, multivariable analysis demonstrated that the presence of Fuhrman grade 3 or 4, sarcomatoid component, tumor size larger than 10 cm, tumor stage pT3 or pT4, and coagulative tumor necrosis were independent predictors of LNM [4]. A preoperative nomogram predicting the presence of LNM and/or the probability of LN progression during follow-up has been developed as well [9], although the model still awaits prospective external validation [9]. Despite improved imaging, LND remains the most reliable form of nodal staging in patients with RCC. According to guidelines, LND is not recommended in localized tumor without clinical evidence of LNM [1]. In contrast, in patients with palpable or CT-detected enlarged lymph nodes, resection of the affected LNs should be performed to obtain adequate staging information and local control [1, 12]. Of note, our study included only patients without clinical evidence of LNM.

The accuracy of nodal staging achieved with a given number of LNs removed differed between the development and validation cohorts in patients with Fuhrman grade III/IV regardless of pathologic stage. These differences highlight one limitation of our model, that is, as the model is based on the actual number of LNs removed in each given cohort of patients, the number of LNs to be examined tends to be higher in the cohort of patients with the higher median number of LNs removed. These differences are not surprising as SEER contains data of both academic and community centers.

We developed and externally validated a simple probabilistic model that calculates the probability of freedom from occult LNM as a function of clinicopathologic parameters and number of LNs examined. Since outcome prediction based on a physician’s experience alone might be subjectively influenced, several postoperative models to predict recurrence or cancer-specific mortality based on clinicopathologic features have been introduced for use in daily clinical practice [26]. Our model is a simple tool that could serve as guidance in the postoperative clinical decision-making regarding follow-up scheduling and administration of adjuvant therapy as part of clinical trials when possible. Based on the current lack of adequate selection criteria or biomarkers for adjuvant therapy in patients with RCC, the current study may provide a potential solution as patients with higher risk of false-negative N0 status may be candidates for adjuvant treatment in clinical trials.

Our study has several limitations. First and foremost are limitations inherent to its multicenter and retrospective designs as well as the long study period in the development cohort. We used the SEER database to validate our model in a contemporary, large cohort of patients treated both at academic and community centers. It is important to acknowledge that the performance of a LND in patients with clinically node-negative status is not yet standardized and not routinely recommended by guidelines. Currently, a LND is primarily recommended in patients with adverse clinical features including a large diameter of the primary tumor. However, the evidence on which this recommendation is based has to be regarded as weak. Nevertheless, our data may be biased as we only included patients with clinically node-negative status. The use of the contemporary SEER database as a validation cohort might introduce additional bias considering including historical cohorts in the development set. Such bias might be caused by staging and grading inaccuracies. Both datasets suffer from shortcomings such as selection criteria for LND, and variation in the use of preoperative imaging, among others. Further, the SEER database does not provide information on the extent of LND or the processing of LNs. While we were able to control for numerous potential confounders, we could not control for surgeon’s and pathologist’s experience, treatment decisions including surgical approach (laparoscopic vs. open), patient and surgeon preferences, as well as the anatomical template of LND. Indeed, the number of LNs examined is not an exact surrogate for the extent of LND. The low nodal yield in the validation cohort might represent the accidental LN finding in the hilar or perirenal fat rather than actual LND. We could not differentiate between these entities using the SEER database potentially introducing additional bias. Furthermore, the number of LNs examined is not only a factor of the extent of LND but is also dependent on the pathological evaluation and inherent differences between patients. We only included patients with cRCC and further studies will be necessary to test our model in non-cRCC cohorts. In addition, our model is built on assumptions. Although these might seem debatable, every single mathematical model and theory is built on assumptions. Prospective validation is thereby warranted to test whether the assumptions were appropriate.

Conclusions

We developed and externally validated a pNSS that estimates the likelihood of false-negative LNM with LND after RN for clinically localized cRCC. We determined that the number of examined LNs needed for adequate nodal staging depends on pathological tumor stage and Fuhrman grade. Our model may be used in patient counseling regarding postoperative surveillance, as well as for trial eligibility in the assessment of adjuvant therapies.