Introduction

Several pathologic staging systems have been developed to risk-stratify patients following completion of neoadjuvant chemotherapy for breast cancer (Table 1). The most commonly used and extensively evaluated system—pathologic complete response (pCR) [1, 3, 4, 10, 14]—is defined as the absence of invasive cancer in breast or lymph node tissue after completion of neoadjuvant chemotherapy [6]. Patients with this response to chemotherapy have a demonstrably lower risk of tumor recurrence than patients with residual carcinoma [18]. When accompanied by results from definitive trials, the Food and Drug Administration (FDA) recognizes pCR as an endpoint for granting accelerated approval in neoadjuvant chemotherapy trials in order to shorten the time to evaluate new chemotherapeutic agents [5, 13]. However, by definition, pCR does not distinguish among patients with residual tumor. Two other staging systems—residual cancer burden (RCB) and the American Joint Committee on Cancer post-neoadjuvant staging (yAJCC)—do stratify patients with residual cancer.

Table 1 Summary of post-neoadjuvant pathologic staging systems

RCB reports a score based on the fraction of the tumor bed area that contains invasive carcinoma excluding in situ disease (“cellularity”), the dimensions of the tumor bed containing residual cancer, the number of residually positive lymph nodes, and the longest diameter of the largest residual nodal metastasis [16]. Raw scores are then categorized into RCB classes using pre-defined cut points, with a score of 0 representing pCR and scores 1–3 representing progressively greater extents of residual cancer. In addition, RCB has been shown to provide prognostic value independent of yAJCC stage for patients with post-treatment stage II and III disease [16].

The post-neoadjuvant yAJCC staging system has also been demonstrated to have prognostic value [15]. yAJCC parses patients into five groups and nine subgroups based on the extent and characteristics of residual disease [2]. The system considers three parameters for pathologic staging: tumor in the breast, tumor in local lymph nodes, and metastases. Residual tumor in the breast (ypT) is determined by pathologic size and extension, as well as chest wall or skin invasion and pre-treatment inflammatory carcinoma. Residual nodal involvement (ypN) is determined primarily by the number of positive lymph nodes, although characteristics of these nodes (e.g., matted or fixed) also affect this score. Finally, yM designates distant metastases, and is typically established clinically before treatment. A combined ypTNM designation yields an overall yAJCC stage, ranging from 0 to IV, with subgroups within stages I–III.

In this study, we compare pCR, RCB, and yAJCC to determine how well these staging systems predicted post-treatment recurrence using the I-SPY 1 trial dataset [3, 4]. Our goal was to determine strengths of these systems as well as areas in which they could be improved, to help guide future refinements of post-neoadjuvant staging.

Materials and methods

Our group has previously published a detailed description of the methods employed in the I-SPY 1 TRIAL (investigation of serial studies to predict your therapeutic response with imaging and molecular analysis) [3, 4]. Briefly, I-SPY 1 was a limited-access cooperative group trial for women with locally advanced (stage II and III) breast cancer treated with neoadjuvant, anthracycline-based chemotherapy. The institutional review boards of all participating sites approved the I-SPY 1 TRIAL protocol (CALGB150007/150012; ACRIN 6657). Our primary outcome was recurrence-free survival (RFS) according to the STEEP criteria.

In the current analysis, we included participants who had completed neoadjuvant chemotherapy, undergone definitive surgery, and who had both RCB and yAJCC stages determined from their post-treatment surgical resection specimens. We excluded patients treated with neoadjuvant or adjuvant trastuzumab because at time of study initiation, trastuzumab was given at physician discretion, typically to patients who were felt not to be responding to then-standard of care regimens. This lowered the proportion of HER2-positive samples in our study population, but there were no other significant differences in pre-treatment patient characteristics or RFS of our study population compared to the full I-SPY 1 TRIAL cohort. To evaluate the impact of exclusion of patients who received trastuzumab, we conducted a separate analysis including these patients; results were not found to differ significantly from those presented here.

Pathologists at participating sites evaluated pCR, RCB, and yAJCC stage components at the time of surgery. PCR was defined as the absence of invasive carcinoma in both breast and lymph node tissue. RCB score and class were determined using the MD Anderson Cancer Center’s online calculator (http://www3.mdanderson.org/app/medcalc/index.cfm?pagename=jsconvert3). To standardize measurements across several sites and minimize inter-observer variability, all study pathologists were trained on RCB calculation during an instruction session at MD Anderson Cancer Center or via an online webinar. The first five specimens from each pathologist were centrally re-reviewed.

yAJCC stage was determined using the 7th edition of the AJCC staging guidelines. Subgroups (IIA/B, IIIA/B/C) were also calculated, but were not used in the recurrence analysis because of insufficient subgroup sample sizes. When nests of tumor cells in fibrotic stroma were observed after treatment, the distance over which the tumor nests spread was used for the measurement of tumor size (ypT).

Recurrence-free survival (RFS) was the primary outcome of interest, and was calculated in accordance with STEEP criteria [8]. We constructed Kaplan–Meier survival curves to stratify patients by yAJCC stage and RCB class overall and within HR/HER2 subtypes, and we applied the log-rank test to evaluate for significant curve separation. Patients were removed from at-risk groups when they were censored or experienced recurrence or death. We used a Cox proportional hazards model to assess clinical and pathological parameters as predictors of RFS, and we computed Harrell’s C statistics, a concordance measure used to assess a model’s predictive performance, to compare systems. We also adopted recursive partitioning to identify variables that best predict RFS. Recursive partitioning is a multivariable analysis tool that builds a decision tree that most effectively predicts the outcome of interest (RFS) by splitting the total group into subgroups based on input variables. The statistical programming environment R was used to carry out the recursive partitioning, using the rpart package. Cox proportional hazards analyses were carried out in STATA version 11.

Results

Study participants

Among the 237 women enrolled in the I-SPY 1 TRIAL, 201 completed neoadjuvant chemotherapy and post-treatment surgery, and had both RCB and yAJCC data available for analysis. Median age was 48 years (range 26–68). Excluding patients treated with trastuzumab, we analyzed 162 patients (Fig. 1). Table 2 summarizes the patients’ pre-treatment characteristics. Most (51%) had clinical stage II cancers. Most had hormone-receptor (ER or PR) positive tumors (64%), and 18% had HER2 positive tumors. Median follow-up time for patients was 6.7 years. After completing chemotherapy, 37 patients (23%) had achieved pCR.

Fig. 1
figure 1

Consort diagram: patients available for analysis

Table 2 Patient characteristics

RCB and yAJCC identify patients at high risk of early relapse

Figure 2 shows recurrence-free survival (RFS) stratified by RCB class, and Fig. 3 shows recurrence-free survival (RFS) based on yAJCC stage. Patients with pCR (i.e., RCB 0 and yAJCC 0) had overall low recurrence rates (92% 5-year RFS), and were at a significantly lower risk of recurrence when compared to patients with any amount of residual disease (Table 3). In comparison, patients with low to intermediate residual disease (RCB 1 or 2, yAJCC 1 or 2) had a ~4-fold increased risk of relapse/death; the increased risk was ~11-fold for patients with extensive residual disease (RCB3, yAJCC3). When we compared patients with RCB 3 to patients with RCB <3, patients with RCB 3 remained significantly more likely to recur (RCB 3 vs. RCB 0/1/2: Hazard Ratio 3.37 (1.96–5.80) p < 0.0001). Similarly, patients with yAJCC III had significantly worse RFS than patients with yAJCC < III (yAJCC III vs yAJCC 0/I/II: Hazard Ratio 3.40 (1.99–5.83), p < 0.0001). Both RCB 3 and yAJCC III remained significant predictors of high recurrence risk after adjusting for age, clinical stage, and HR status, both when RCB and yAJCC were stratified into four classes and when they were dichotomized (RCB 0/1/2 vs 3 and yAJCC 0/I/II vs III).

Fig. 2
figure 2

ad Recurrence-free survival (RFS) based on residual cancer burden (RCB) for all patients (a) and by subtype (bd)

Fig. 3
figure 3

ad Recurrence-free survival (RFS) based on American Joint Committee on Cancer (yAJCC) stage for all patients (a) and by subtype (bd)

Table 3 Significance of key variables in Cox modeling of RFS

TN and HER2+ subtyping improves RCB and yAJCC predictive ability

Both RCB and yAJCC show the strongest association with RFS in patients with “triple-negative” (HR−/HER2−, abbreviated TN) cancers (Figs. 2b, 3b). Conversely, patients who had HR+/HER2− tumors had relatively low recurrence rates regardless of RCB or yAJCC class, and neither RCB nor yAJCC significantly associate with RFS within this subtype (Figs. 2c, 3c). Likewise, a comparison of the predictive performances of Cox proportional hazard models constructed for all patients versus within individual subtypes suggests that the staging systems tended to predict RFS more effectively when subtype was taken into account. This is indicated by higher Harrell’s C statistics (an indicator of a model’s predictive ability) for each system within the TN and the HER2+ subtypes versus overall, although this effect was not seen in patients with HR+/HER2− tumors (Table 4). When we analyzed RCB score as a continuous rather than a discreet variable, the continuous score was significantly associated with RFS in all subtypes (Table 5).

Table 4 Comparison of pCR, yAJCC, and RCB as predictors of RFS by subtype
Table 5 Significance of the association between RFS and the continuous RCB index and RCB class within subtypes

Recursive partitioning selects yAJCC III and RCB 3 as more effective predictors of early relapse than subtype

Recursive partitioning was used to identify the optimal way to separate patients based on their recurrence-free survival. We included pCR, RCB, yAJCC, age, and hormone-receptor/HER2 status as potential variables for the model to use. The model initially separated patients by yAJCC, identifying “high-risk” (yAJCC III) and “low-risk” (yAJCC 0/I/II) groups. It subsequently separated patients in the high-risk group by receptor status, and the low-risk group by whether a pCR was achieved (Fig. 4a). In this model, patients who were yAJCC III and TN or HER2+ were at significantly higher risk of relapse than all other patients (Hazard Ratio 8.39 (4.41–15.94), p < 0.0001). When yAJCC was excluded as an input variable, recursive partitioning selected RCB as the optimal variable to identify patients who would recur, again separating patients into “high-risk” (RCB 3) and “low-risk” (RCB 0/1/2) groups. Like yAJCC, the model subdivided the “high-risk” group based on receptor subtype and “low-risk” patients by whether a pCR was achieved (Fig. 4b). In this model, patients with RCB 3 who were TN or HER2+ were at significantly higher risk of early relapse than all other patients (Hazard Ratio: 7.47 (3.83–14.57, p < 0.0001).

Fig. 4
figure 4

a and b Recursive partitioning models of predictors of RFS. Under each branch, the calculated risk is listed. Beneath that, the number of patients with a recurrence or death is divided by the number of patients within each category. Figure 4b was generated when yAJCC was excluded from the model

yAJCC and RCB stages are often discrepant

In 34% (n = 55) of patients, RCB and yAJCC staging systems were discrepant (Table 6). RCB class was greater than yAJCC in 36 patients, while yAJCC staging was greater than RCB in 19 patients. Discrepancies were largely due to unequal weighting of positive lymph nodes in the two systems and the weighting of tumor cellularity in RCB, which is not incorporated into yAJCC. For 11 patients with discrepant RCB and yAJCC scores, pathology slides were available to re-review to qualitatively assess the reasons for the discrepancy. Visual examples of features that commonly led to discrepancy are given in Supplemental Fig. 1. These images show a cancer with low cellularity but many positive nodes that received a higher yAJCC stage than RCB stage (Supplemental Fig. 1a–b), and also illustrate a tumor with high cellularity and no positive nodes that received an RCB stage greater than yAJCC stage (Supplemental Fig. 1c).

Table 6 Concordance and discordance of RCB and yAJCC risk classifications

Discussion

Pathologic response to treatment after neoadjuvant chemotherapy provides valuable prognostic information. The goal of this analysis was to compare three commonly used staging systems and highlight areas in which these systems differ and could be improved. Our analysis revealed three key findings: (1) RCB and yAJCC identify patients at high risk of early relapse, (2) predictive ability of these staging systems increases when HR/HER2 subtype is taken into account, and (3) RCB and yAJCC often produce discrepant results, largely driven by differential treatment of lymph nodes and the inclusion of cellularity in calculation of RCB. These findings complement a recent, multi-cohort analysis of RCB [17].

In contrast to pCR, our recursive partitioning analysis suggests that a primary utility of RCB and yAJCC may be to identify patients with residual tumor who are at highest risk of relapse. Specifically, the model identified yAJCC III and (when yAJCC was excluded) RCB 3 as the primary predictors of relapse.

By combining receptor subtype and staging system, recursive partitioning was able to identify a subset of patients that was at exceptionally high risk of early relapse: patients with extensive residual tumor (yAJCC III and/or RCB 3) whose tumors were also TN or HER2+ (three-year RFS for patients with yAJCC III and RCB 3 was 27% and 29%, respectively, within these subtypes). Previous analysis has shown that pCR was more effective at predicting RFS within receptor subtypes than in all cases combined within the I-SPY 1 dataset [4]. Our study extends this finding to patients with varying degrees of residual tumor, in which both RCB and yAJCC staging systems tend to be more predictive of RFS when analyzed by subtype.

The exception to this trend was the HR+/HER2− subtype. The majority of HR+/HER2− patients were at either intermediate or high risk for recurrence according to RCB (82% of these patients were RCB 2 or 3) and yAJCC (71% of these patients were yAJCC II or III), yet patients with this subtype had better RFS than patients with HER2+ or TN cancers. This result supports previous analysis suggesting that HR+/HER2− tumors may be intrinsically less responsive to chemotherapy [3], and therefore patients with these tumors may be predisposed to lower rates of pCR [7] and more extensive residual tumor. However, it is well known that patients with HR+/HER2− subtype tumors tend to experience lower rates of recurrence than patients with HER2+ or TN tumors, consistent with our results. Notably, using a combined analysis of several cohorts, Symmans and colleagues demonstrated that RCB did predict RFS among HR+/HER2− patients, including between participants at intermediate (RCB 2) and high (RCB 3) predicted risk of relapse [17]. Nonetheless, like in our study, Symmans found that a substantial majority (60%) of patients with HR+/HER2− disease were classified as RCB 2, suggesting potential need for further methods to stratify patients within HR+/HER2− patients.

In contrast to yAJCC, RCB is calculated as a continuous score. Although RCB class was specified using cut points that were determined using a unified cohort in which all subtypes were represented [16], the continuous score may allow for definition of subtype-specific cut points. Although our dataset was not large enough to establish subtype-specific cut points, we nevertheless found that within each subtype, continuous RCB score was significantly associated with RFS (notably, categorical RCB score was not significantly associated with RFS in HR+/HER2− patients). In the future, the ability to define subtype-specific RCB class cut points may make RCB particularly valuable as a post-neoadjuvant risk-stratification system.

In our dataset, different weighting of lymph nodes and tumor cellularity were the primary drivers of discrepancies between RCB and yAJCC. In discrepant cases, if one staging system identified the patient as high risk, then that patient tended to have increased rate of early recurrence regardless of how the other system ranked her tumor. This suggests that there may be benefit to computing both scores for patients to identify those at highest risk of relapse. Sample-size limitations prevented us from defining subsets of patients based on nodes or cellularity in which one system out-performed the other. However, as post-neoadjuvant pathological staging continues to evolve, our results suggest that an avenue to improve these systems may be to re-evaluate the role of cellularity and number of positive lymph nodes, drawing from apparent strengths of RCB and yAJCC, respectively.

Although pathologic staging using one or more of the systems reviewed here is routinely carried out following neoadjuvant chemotherapy, there are known limitations to calculation of each of these scores. For RCB, cellularity and overall dimensions of the tumor bed are subjective, and the tumor bed can show heterogeneity in response, complicating calculation of cellularity and dimensions. In the I-SPY trial, standardized training for pathologists, complemented by diagrams and a standard protocol for slide review, were used to ensure consistent calculation of cellularity and tumor bed size. For yAJCC, the presence of scattered foci of residual tumor in the tumor bed may compromise calculation of residual tumor size. Finally, inclusion or exclusion of ductal carcinoma in situ may vary in determination of pCR. In our study, the presence of residual DCIS in the absence of other residual tumor was considered to be pCR, reflecting a typical, but not universal, definition of pCR. Although residual ductal carcinoma in situ was found to predict adverse outcomes in one recent cohort [18], it has not found to adversely affect outcomes in other studies [9, 11]; its impact on outcomes remains a topic of investigation.

The limitations of our analysis include short median follow-up time (6.7 years) and small sample size. The short follow-up has the greatest impact on the HR+/HER2− subset, in which low overall rates of early relapse limited additional stratification by yAJCC and RCB. In addition, trastuzumab became standard of care for HER2+ patients while this study was ongoing; prior to then, it was administered at physician discretion. By excluding patients who received trastuzumab, we avoided introducing this source of bias into our analysis, but future analyses of post-neoadjuvant staging should assess this staging in cohorts in which the most up to date chemotherapy guidelines are reflected. To assess the effect of excluding patients who received trastuzumab, we conducted an analysis on the complete cohort, including these patients (data not shown), and found no significant differences from the results we present here. Finally, inter-observer variability in pathologic assessment was not addressed in this study, although it has previously been shown to be high among a small sample of pathologists evaluating RCB [12].

In summary, we have shown that RCB and yAJCC staging systems identify patients who are at highest risk for early recurrence, in contrast to pCR, which selects patients at lowest risk for relapse. In addition, our analysis suggests that combining pathologic staging with HR/HER2 subtyping further stratifies patients’ risk, and that triple-negative status combined with high disease burden poses the greatest risk to RFS. Continuous RCB score was found to be significantly associated with RFS within each subtype, suggesting that distinct risk classification cut points could be determined for each subtype to improve RFS predictions. Finally, we found that RCB and yAJCC frequently produce discrepant risk predictions, resulting primarily from different treatments of lymph nodes and tumor cellularity. Patients with high tumor cellularity may particularly benefit from calculation of RCB in addition to routine yAJCC staging. Altogether, our findings indicate that tumor cellularity, lymph node status and receptor status are useful areas of further investigation for evolving post-neoadjuvant tumor staging systems.