Even with its relative rarity, the incidence of appendix adenocarcinoma has been steadily increasing over the past few decades, possibly due to an improved disease understanding and its recognition as a distinct entity.1,2,3 Studies demonstrate that appendix cancer represents an individual disease process separate from colorectal cancer, showing differences in genomic profiles, clinical behavior, and prognosis.4,5,6 Nonetheless, diagnosis and treatment of appendix cancer remains a significant clinical challenge. Despite ongoing efforts to establish comprehensive staging guidelines specific for appendix cancer, even now, some recommendations are extrapolated from studies on colorectal cancer.7,8,9

Perhaps one of the most critical factors in staging appendix adenocarcinoma is the assessment of regional lymph node (LN) involvement. The presence of LN metastases has been shown to be a strong predictor of oncologic outcomes in various types of cancer, including appendix cancer.10,11,12,13 However, the likelihood of identifying LN metastases is multifactorial. Patient-related factors, such as histologic subtype, tumor size, age, and sex, have been shown to impact the likelihood of LN positive (LNP) disease.10 Physician-related factors, such as the extent and location of lymphadenectomy, as well as the number of examined LNs, also can affect the probability of detecting metastatic LN disease when it exists. Accordingly, to ensure precise staging and appropriate treatment, it is essential to focus on harvesting and examining an adequate number of LNs. Based on colorectal cancer studies, current appendix cancer guidelines suggest an evaluation of at least 12 LNs for staging.8,14 However, at this time there are no established standards based on appendix cancer data that specify the requirement for LN examination to accurately stage different histologic subtypes. Furthermore, the impact of an inadequate evaluation on the oncologic outcomes of appendix cancer patients remains uncertain.

Determining an optimal number of LNs for staging can be challenging. Multi-institutional, large-scale databases might provide enough statistical power to address this question, allowing to account for multiple factors that could influence the result. With data from more than 1,500 Commission on Cancer (CoC) accredited facilities across the United States, the National Cancer Database (NCDB) is one of the largest comprehensive cancer datasets available.15 By leveraging this dataset, we aimed to identify the minimum number of LNs required for accurate staging of appendix cancer and assess the impact of low LN sampling on survival outcomes. This information may inform the development of evidence-based guidelines for the optimal management of different histologic subtypes of appendix cancer, ultimately leading to improved patient outcomes.

Methods

A retrospective multi-institutional cohort study using the NCDB was performed. The NCDB is a joint project of the CoC of the American College of Surgeons and the American Cancer Society.15 This study was approved by our Institutional Review Board.

Study Population

Patients with confirmed histologic diagnosis (NAACCR#490) of appendix cancer (International Classification of Diseases for Oncology 3rd edition (ICD-O-3) code C18.1) with invasive behavior (NAACCR#523) were included. Patients with histologic diagnosis (NAACCR#522) of mucinous (8470-8472, 8480, 8481), nonmucinous (8010, 8020, 8140, 8144, 8210, 8211, 8255, 8260-8263, 8440, 8460, 8470-8472, 8480, 8481, 8490, 8560), and signet ring cell (SRC) adenocarcinoma (8490) were analyzed. Patients with American Joint Committee on Cancer (AJCC) stage II and III disease undergoing a complete surgical resection (NAACCR#1320) and regional lymphadenectomy (NAACCR#1292) at the reporting facility (NAACCR#670), with complete information about LN examination (NAACCR#830 and #820) were included. Patients with tumors of other histologic subtypes (e.g., neuroendocrine, carcinoid, and goblet cell adenocarcinoma), noninvasive behavior (e.g., low-grade and high-grade mucinous neoplasms), or with reported additional malignancies (NAACCR#560) were excluded (Electronic supplementary: Table e1).

Variables of Interest

Patient demographics and clinical variables analyzed included age at diagnosis (NAACCR#230), sex (NAACCR#220), race (NAACCR#160), insurance status (NAACCR#630), Charlson-Deyo comorbidity index,16 facility type, definitive surgical procedure (NAACCR#1290), tumor histology (NAACCR#522), and tumor grade (NAACCR#440). Systemic chemotherapy sequence (NAACCR#1639) was recoded as neoadjuvant chemotherapy (NACT) administered (codes: 2, 4, 6, and 7), no NACT administered (codes: 3 and 5), adjuvant systemic chemotherapy (ASC) administered (codes: 3, 4, 6, and 7), no ASC administered (codes: 0, 2, and 5), and unknown (codes: 9 and missing). The number of LNs examined and positive for disease were used to determine the LN positive ratio (LNR), with a LNR ≥0.25 considered higher risk.

Determination of Inadequate Lymph Node Examination

The average number of LNs evaluated in patients reported to have undergone at least a hemicolectomy was used as the reference category for a complete lymphadenectomy. We then calculated the odds ratio (OR) of finding LNP disease for various numbers of LNs examined compared with the reference category using a mixed effects logistic regression. The OR was adjusted for factors known to affect the number of LNs harvested and probability of LNP disease,10,17 with a random effect added to the year of diagnosis to account for unmeasurable temporal variations. The p-values from the regression model were corrected for multiple comparisons using the Benjamini-Hochberg procedure.18 The smallest number of LNs evaluated that did not show significantly lower odds of finding LNP disease than the reference category was used as the cutoff definition for the minimum LN examination.

Probability of True Lymph Node Negative Disease

Based on Dehal et al.,19 we used a Bayesian probabilistic model with a hypergeometric distribution to estimate the probability of true LN negative (LNN) disease given different numbers of LN examined. Assuming equal probability of any regional node being positive, we estimated the probability of observing m positive nodes in n examined LNs where N is the total number of regional LNs and M is the total number of real positive LNs, using the hypergeometric distribution as follows:

$$P\left( {M|m;\;N, n} \right) = \frac{{P\left( {m|M;\,N,n} \right)P\left( M \right)}}{{\mathop \sum \nolimits_{M = m}^{N - n + m} P\left( {m|M;\;N,n} \right)P\left( M \right)}},\quad M = m, \ldots ,N - n + m$$

From this, the probability of true LNN disease (M = 0), when n LNs are examined and are free of disease (m = 0), can be estimated with Bayes theorem as:

$$P\left( {M = 0|m = 0;\,N, n} \right) = \frac{{P\left( {M = 0} \right)}}{{\mathop \sum \nolimits_{M = 0}^{N - n} \frac{{\left[ {\left( {N - M} \right)!\left( {N - n} \right)!} \right]}}{{\left[ {\left( {N - M - n} \right)!N!} \right]}}P\left( M \right)}},$$

N was assumed to be equal to the average number of LN evaluated in patients who underwent a hemicolectomy. The probability of M metastatic LNs P(M) was estimated empirically for the overall cohort and separately for each histologic subtype and grade of differentiation, in patients with at least 12 LNs examined, as:

$${\text{P}}\left( {\text{M}} \right){ } = { }\frac{{\text{Number of patients with M positive LN}}}{{\text{Total number of patients at risk}}}$$

Using this model, the probability of true LNN disease for a given number of LNs examined was estimated for the overall cohort, different histologic subtypes, and tumor differentiation grades.

Survival Analysis

Only patients with complete information about vital status (NAACCR#1760) and follow-up were included in survival analysis. Overall survival (OS) was measured from the date of diagnosis to death or last contact.

Statistical Analysis

Analysis was conducted using R (R Core Team, 2022). Continuous variables, presented as mean ± standard deviation (SD), were compared by using the t-test. Categorical variables were presented as proportions, with the chi-squared test used for statistical comparison. The Kaplan-Meier method was used to estimate survival probability, and the log-rank test was used to compare survival outcomes between groups. For regression models, we employed a complete case analysis approach, whereby only patients with complete information were included and no data imputation was performed. Multivariable Cox proportional-hazard regressions were used to adjust survival factors by other confounding variables. To preserve proportionality, the multivariable model was further stratified in LNP and LNN disease models. Statistical significance was considered when p-value < 0.05

Results

Descriptive Statistics of the Study Population

Overall, 43,549 patients diagnosed with primary appendix cancer between 2004 and 2019 were identified from the NCDB. From which, 3,602 patients met inclusion criteria and were analyzed (Fig. 1). Of the included patients, 1,697 (47.1%) were females, and mean age at diagnosis was 60.8 ± 13.5 years. Histologic subtypes included: 1,512 (42.0%) mucinous adenocarcinoma, 1,828 (50.7%) nonmucinous/colonic-type adenocarcinoma, and 262 (7.3%) SRC adenocarcinoma. The definitive surgical approach recorded was appendectomy/ileocecectomy in 801 (22.2%), hemicolectomy or a more extensive approach in 2,741 (76.1%), and not specified for 60 (1.7%) patients. The overall average number of LNs harvested was 18.9 ± 9.9, with 16.4 ± 9.9 in patients who had an appendectomy/ileocecectomy versus 19.6 ± 9.9 in those undergoing at least a hemicolectomy (p < 0.01). Evidence of LNP disease was reported in 1,026 (28.5%) patients. The rate of LNP disease according to grade of differentiation was 14.9% (134/895), 28.6% (446/1559), and 38.8% (446/1365) for G1, G2, and G3–G4 patients, respectively (p < 0.01). LNP disease rates by histologic subtype were 21.5% (325/1512), 31.8% (582/1828), and 45.4% (119/262) in mucinous, non-mucinous, and SRC adenocarcinoma, respectively (p < 0.01). ASC was administered in 1,529 (42.4%) patients, with missing ASC information in 309 (8.6%) patients. A total of 131 (3.6%) were reported to have undergone intraoperative chemotherapy: 90 (2.5%) alone and 41 (1.1%) in combination with neoadjuvant or adjuvant regimens.

Fig. 1
figure 1

Patient selection flow diagram. HAMN high-grade appendiceal mucinous neoplasm; LAMN low-grade appendiceal mucinous neoplasm; LN lymph node; SRC signet ring cell

Minimum Lymph Node Examination

The reference category for the minimum LN assessment models was set to 20 LNs based on the average number of LNs evaluated in patients undergoing at least a hemicolectomy (19.6 ± 9.9). While 1,269 (35.2%) patients had more than 20 LNs evaluated, only nine (0.2%) had more than 20 positive LNs, validating the reference category selection. From a mixed-effects logistic regression model, the adjusted odds of finding LNP disease compared with the reference category (≥ 20 LNs) was significantly lower in the categories of ≤ 7 (odds ratio [OR] 0.46, p < 0.01), 8 (OR 0.28, p < 0.01), and 9 (OR 0.45, p = 0.04) LNs examined, but not in the categories of 10 (OR:0.93, p = 0.79) or more LNs examined (Fig. 2a; Electronic supplementary: Table e2). From the Bayesian model, in the overall cohort, the probability of true LNN disease with evaluation of ten LNs was estimated to be 90.1% (Fig. 2b). When the analysis was stratified by histologic subtype, with ten LNs examined the conditional probabilities of true LNN disease were 92.4%, 88.3%, and 86.2% for mucinous, nonmucinous, and SRC adenocarcinoma, respectively. These conditional probabilities by histologic subtype also varied according to grade of differentiation (Fig. 3a–c).

Fig. 2
figure 2

a Mixed effect logistic regression model. Odds ratio of finding LNP disease by different number of LNs evaluated, adjusted by age, sex, histologic subtype, and grade of differentiation with a random effect added to year of disease diagnosis, compared to the reference category (≥ 20 LNs). The complete model is presented in the Electronic supplementary: Table e2. b Conditional probability determined with a Bayesian model using a hypergeometric distribution to estimate the probability of true LNN disease in the overall cohort. LN lymph node; LNN lymph node negative; LNP lymph node positive; OR odds ratio; ref reference category. *p < 0.05; **p < 0.01

Fig. 3
figure 3

Conditional probability determined with a Bayesian model using a hypergeometric distribution to estimate the probability of true LNN disease by different grades of tumor differentiation and stratified by histologic subtype in a mucinous, b nonmucinous, and c SRC adenocarcinoma of the appendix. G grade; LN lymph node; LNN lymph node negative; SRC signet ring cell

Number of Lymph Nodes Examined and Positive Lymph Nodes

Overall, 466 (12.9%) patients had < 10 LNs examined. The rate of patients with < 10 LNs evaluated showed a decreasing trend from 30.0% (39/130) in 2004 to 10.7% (33/309) in 2019 (p for trend < 0.01; Electronic supplementary: Fig. e3). In the cohorts with < 10 LNs versus ≥ 10 LNs evaluated, mean age was 63.2 ± 13.3 versus 60.4 ± 13.5 (p < 0.01), patients undergoing at least a hemicolectomy were 277 (59.4%) versus 2,464 (78.6%) (p < 0.01), and patients with LNP were 85 (18.2%) versus 941 (30.0%) (p < 0.01). Demographics and clinical characteristics are summarized in Table 1. Of the patients with LNP disease, those with < 10 LNs had a mean LNR of 0.45 ± 0.28 with 72.9% (62/85) having a LNR ≥ 0.25 versus 0.18 ± 0.19 and 21.5% (202/941) in those with LNP disease having ≥ 10 LNs examined (p < 0.01).

Table 1 Demographic characteristics in the overall appendix cancer cohort and stratified by the number of LN evaluated

Prognostic Significance of LN Examination

A total of 3,293 patients had complete information for survival analysis. Median follow-up from diagnosis was 75.4 (95% confidence interval [CI] 72.8–77.5) months. Median OS was 110 (95% CI 85.5–148.0) months with a 5-year OS of 63.4% (95% CI 58.8–68.4) for patients with < 10 LNs evaluated versus a median 168 (95% CI 159.4-not reached) months and a 5-year OS of 73.1% (95% CI 71.2–74.8) for patients with ≥ 10 LNs (p < 0.01) evaluated (Fig. 4a). Univariable hazard ratio (HR) for OS was 1.40 (95% CI 1.20–1.64) in patients with < 10 LNs evaluated versus those with ≥ 10 LNs (p < 0.01). After adjusting for known prognostic factors in a multivariable model (Table 2), the adjusted HR of failing to evaluate at least 10 LNs was 1.39 (95% CI 1.16–1.68).

Fig. 4
figure 4

Kaplan-Meier survival analysis comparing overall survival between patients having <10 LN and ≥10 LN evaluated for staging in the a overall cohort, b cohort of LNN patients, and c LNP patients. Median survival times are represented as a dashed line, colored areas represent 95% confidence intervals for survival probability. Censoring is represented as vertical lines on the curves. Log-rank tests were used to compare survival outcomes. AJCC American Joint Committee on Cancer; LN lymph nodes; LNN lymph node negative; LNP lymph node positive

Table 2 Univariable and multivariable Cox proportional hazard regression for overall survival in the overall cohort, and stratified by lymph node status

Sensitivity Analysis Stratified by Lymph Node Status

Univariable stratified Kaplan-Meier analysis by LN status is shown in Fig. 4b and c. After adjusting for known prognostic factors in the stratified multivariable models (Table 2), the adjusted HR of evaluating < 10 LNs was 1.54 (p < 0.01) for LNN disease. For LNP patients, evaluating < 10 LNs showed a nonsignificant adjusted HR of 1.02 (p = 0.91) but a LNR ≥ 0.25 was significantly associated with OS (HR 1.96, p < 0.01).

Discussion

The accurate staging of appendix cancer is critical for risk stratification and determining appropriate treatment strategies. Using data from the NCDB, our study sought to identify the minimum number of LNs required to confidently rule out metastatic LN disease in appendix adenocarcinoma. We found that the number of LNs examined was significantly associated with survival in patients with appendix cancer, showing that staging with an examination of less than ten LNs was associated with reduced OS (HR 1.39, p < 0.01). Examining at least ten LNs had a low conditional probability (<10%) of occult LN metastatic disease (Fig. 2b). However, tumor aggressiveness appears to modify this recommendation, suggesting that the required number of LNs should consider the grade of differentiation and histologic subtype (Fig. 3).

We found that patients with more aggressive histologic subtypes may require a higher number of examined LNs to determine LN status accurately. For instance, patients with SRC adenocarcinoma displayed lower diagnostic accuracy with ten LNs examined and might require a higher number of LNs examined (i.e., ≥14 LNs) to achieve a similar staging accuracy as less aggressive subtypes (Fig. 3). This association between tumor aggressiveness and number of LNs needed to decrease the probability of missing occult LN disease also was seen with the tumor grade of differentiation (Fig. 3). A similar pattern has been reported in studies of colorectal cancer with regards to the T-stage.20,21 The higher probability for systemic disease in more aggressive tumor biology likely contributes to the need for more extensive LN harvesting and examination to ensure accurate staging. This emphasizes that while a minimum number of LNs is necessary to assess staging quality, diagnostic accuracy is dependent on histological features and biological behavior.

Although it may be challenging to isolate the therapeutic effect of a more extensive lymphadenectomy from the improvement in diagnostic accuracy,22 our analysis suggests that an inadequate identification of the extent of disease, resulting in understaging, is the primary cause of the prognostic impact associated with a low (<10) LN examination. Despite less extensive surgical approaches (i.e., less than a hemicolectomy) resulting in a lower LN count (average 16.4 ± 9.9 LNs vs. 19.6 ± 9.9, p < 0.01), the increased mortality risk of a suboptimal LN examination persisted in a multivariable model after adjusting by the extent of surgical approach (Table 2). In the case of low-grade mucinous adenocarcinoma, current recommendations suggest that these patients might not require a hemicolectomy.23 Our analysis showed that even though 14.9% (134/895) of patients with low-grade adenocarcinoma had evidence of LN metastasis, when stratified by grade and histologic subtype (Fig. 2), there was a low (<10%) probability of occult LN disease in these patients even with lower LN counts. These findings support the practice of less extensive surgery in patients with mucinous low-grade disease, and the surgical approach should be individualized according to clinical presentation and features of invasive behavior.

Stratified analysis by LN status showed that an examination of less than ten LNs was an independent risk factor (HR 1.54, p < 0.01) for disease deemed LNN but not for patients with LNP disease (HR 1.02, p = 0.91). This could be because LNN disease is more likely to be inaccurately staged and undertreated if an insufficient number of LNs are evaluated, whereas LNP disease will be treated as such regardless of the LN yield. This suggests that the survival impact of a lower LN count in this cohort might be related to occult metastatic LN disease in patients with suboptimal evaluations. In the case of LNP disease, it is well known that the number of involved LNs is a prognostic factor. While the unadjusted analysis indicated a survival advantage when ten or more LNs were evaluated in LNP disease (Fig. 4c), the multivariable model (Table 2) showed a significant adjusted effect for the LNR instead (HR 1.96, p < 0.01). The effect of LNR also has been observed in colorectal cancer and has been hypothesized as a potential better predictor than the number of positive LNs alone.24,24,25,26,28 This can be explained due to the LNR simultaneously capturing more aggressive disease with a wider LN spread and identifying patients with a low overall number of LNs evaluated who are at risk of having more widespread disease than is apparent from the LN count alone. LNR appears to be a feasible alternative to identify higher-risk LNP patients, particularly those with a low (i.e., <10) LN count. However, optimal cutoff values for LNR have not been studied for appendix cancer patients.

The underlying principle is that a higher number of LNs evaluated results in a more accurate determination of the extent of disease. However, multiple factors can influence the final LN count in surgical specimens and obtaining high numbers is not always realistic.17 There is very limited evidence about LN count and probability of LN metastatic disease available that is specific to appendix cancer. Using the Surveillance, Epidemiology, and End Results (SEER) database, Du and Xiao found an optimal number of 11 LNs for patients with neuroendocrine tumors of the appendix.29 Similarly, Fleischmann et al. observed a significant change in the rate of LNP disease occurring with an evaluation of ten or more LNs in appendix cancer patients from the SEER database.30 Current guidelines suggest the evaluation of at least 12 LNs to stage appendix cancer.8 This recommendation is based on initial studies performed on colorectal cancer patients,31 and multiple subsequent studies in colon cancer have shown variations in this recommendation.19,20,32 We aim to provide evidence about the diagnostic accuracy of different numbers of LNs to support the use of recommendations for a minimal LN assessment and avoid misclassification risks if this number is not reached. We propose to use the threshold of minimum ten LNs as an overall measure of surgical quality and accuracy in pathological staging of primary appendix cancer. Following this, staging accuracy should be weighted according to histologic features and tumor aggressiveness. Adopting a data-driven approach to minimal LN examination could provide a greater degree of certainty in the pathological staging of primary appendix cancer and define patients at higher risk of missed LN positive disease. This, in turn, would reduce the risk of understaging and inadequate treatment, ultimately improving patient outcomes.

By adopting a threshold of <10 LN to define low LN count, we observed that a significant proportion (12.9%) of patients had low LN evaluation for staging. Although the rate of patients with a low LN count appears to be a decreasing with time (30.0% in 2004 vs. 10.7% in 2019, p for trend < 0.01), indicating improvements in staging practices, continuous efforts to enhance these rates are still needed. Sensitivity analysis performed in stage II disease demonstrates that patients with an insufficient LN evaluation display an increased mortality risk (likely due to higher risk of occult LN disease) over stage II patients with appropriate staging (≥10 LNs). This finding suggests that patients with no observed disease in an evaluation of only one to nine LNs should not be staged as pathological LNN (pN0) disease, which contradicts the recommendation of the 8th edition of the AJCC Cancer Staging Manual.14 We believe that a minimum of ten LNs should be assessed and that the guidelines should recommend against staging disease as LNN if a minimum number of LNs is not evaluated. This might require creating a separate higher risk category for patients without evidence of LN metastases, but with an inadequate (i.e., < 10) LN examination, understanding these could represent a combination of true LNN patients and patients with occult LN metastases. Although studies have shown the benefit of ASC in selected patients with appendix adenocarcinoma,33 additional studies in this subset of patients are needed to determine its role given the inherently higher risk for disease recurrence.

The present study carries important limitations due to the retrospective nature of the data. The nonrandom allocation of patients to the number of LNs evaluated could result in additional confounding, which could not be controlled for with the available data. Second, our findings rely on the accuracy of the data from the NCDB, which is subject to a temporal bias and fluctuations in the quality of reporting.34 The duration of the study (2004–2019) could result in significant variations due to changes in classification and staging systems over time, even after attempting to mitigate these possible variations with a mixed-effect model. Another limitation arises from the LN harvesting data, which includes aggregated information from all surgical procedures performed as part of the initial treatment of the disease and does not always represent a single surgical specimen. Additionally, the lack of information on recurrence and cancer-specific survival is a limitation resulting in extrapolating the risk of dying from disease from the OS. Lastly, the probabilistic model is sensitive to the initial conditions and required assumptions, such as an equal probability of disease in all evaluated LNs, not being able to discriminate between anatomical regions of LNs.

Conclusions

Our study provides further evidence of the prognostic significance of an appropriate LN evaluation and underscores the importance of accurate staging in appendix cancer. A minimum of ten LNs should be evaluated in appendix cancer to confidently determine the absence of LN disease, with the caveat that higher numbers might be required in more aggressive histologies. The risk of understaging is particularly relevant in appendix cancer, given its relative rarity and the potential for delayed diagnosis. The practice of staging disease as LNN with a suboptimal LN evaluation should be avoided due to the potential risk for missing occult LN disease.