Oncotype DX (Genomic Health, Redwood City, CA) Recurrence Score (RS) is a 21-gene assay that provides risk stratification for hormone receptor (HR)-positive, HER2/neu-negative invasive breast cancer and estimates the survival impact of adjuvant chemotherapy in addition to adjuvant endocrine therapy on patients presenting with node-negative, resectable disease.1,2,3 Testing is performed on appropriately selected tissue from the surgically resected specimen. Patients with a low-risk RS (less than 18) are adequately treated with adjuvant endocrine therapy and chemotherapy adds minimal benefit. A high-risk RS (at least 31) indicates a worthwhile survival benefit with adjuvant chemotherapy. Patients with an intermediate score (18–30) represent a more challenging category regarding the risk–benefit balance of adjuvant chemotherapy.

Breast cancer outcome disparities associated with racial/ethnic identity are well-documented in the United States.4,5 Population-based breast cancer mortality rates are higher among African American (AA) compared with white American (WA) women; this is at least partly explained by the twofold higher rates of biologically aggressive triple-negative breast cancers observed among AA patients.6 After stratifying for tumor phenotype, several investigators have reported that survival disadvantages persist among AAs with HR-positive tumors even after accounting for treatment and other demographic variables.7,8,9 Nonetheless, data on Oncotype DX RS utilization and results in AA patients are limited.

Oncotype DX RS predicts for benefit from adjuvant chemotherapy independent of patient age as well as primary tumor size. Applications of this technology in the setting of neoadjuvant chemotherapy (NACT) are sparse. Limited data suggest that the use of diagnostic core needle biopsy tissue for Oncotype DX RS testing is feasible, but adequate tissue for RNA extraction is not consistently available and this costly evaluation may not be covered by insurance in the neoadjuvant setting.10,11,12,13,14 Furthermore, HR-positive and HER2/neu-negative tumors tend to respond sluggishly to NACT, and high-risk recurrence scores do not necessarily correlate with tumors that will be readily downstaged.15 Patients with relatively bulky but resectable primary tumors that are HR-positive and HER2/neu-negative are routinely referred to undergo primary surgery, even if this means that a mastectomy is necessary. The ability to predict for a high-risk RS (thereby confirming appropriateness of chemotherapy), as well as to predict for likelihood of tumor downstaging with neoadjuvant treatment can potentially improve lumpectomy eligibility among patients with bulky HR-positive, HER2/neu-negative breast cancer.

Methods

This project was approved by the Institutional Review Board for the Henry Ford Health System (HFHS) and cases were identified from a prospectively maintained database. Part I of this project involved female breast cancer patients who underwent Oncotype Dx RS (Genomic Health, Redwood City, CA) testing following primary surgery for HR-positive, HER2/neu-negative, node-negative disease at HFHS from January 2012 to December 2016. Clinicopathologic variables assessed from electronic medical records included: age at diagnosis; race/ethnicity; menopausal status; primary tumor size; tumor histology; estrogen receptor (ER) expression; progesterone receptor (PR) expression; proliferative index MIB1 (Ki67); extent of angiolymphatic involvement; tumor histologic grade; and overall Nottingham score. These variables were compared for patients found to have a low-risk RS, defined as 0–17, intermediate risk RS (18–30), or high-risk RS (> 30). Analysis of variance (ANOVA) was performed to test the difference of means for continuous variables among the three oncotype groups. Chi square (χ2) test was used to examine the frequency distribution of categorical variables. A linear model was constructed based on clinicopathologic variables from this dataset to predict for individual RS. The receiver operator curve (ROC) was constructed to compare the groups predicted to have high-risk versus low/intermediate RS. The ROC area under the curve was calculated and the optimal cutoff threshold for predicted group was selected as the value resulting to the point closest to perfect classification. All statistical analyses were implemented in R programming language version 3.21.

In Part II, we evaluated a dataset of patients with clinical disease stage and biomarker pattern comparable to those from Part I but who received NACT between January 2008 and December 2016 and for whom Oncotype testing was not performed. Inflammatory breast cancers were excluded. We evaluated initial clinicopathologic tumor features (including size estimates based on mammogram, ultrasound, and/or clinical exam), as well as chemotherapy response based on surgical pathology. We defined significant tumor response as a decrease in tumor size by at least 1 cm comparing the prechemotherapy size estimate to the size of the remaining invasive component in the final surgical specimen. Using this definition for tumor response (rather than correlation with complete pathologic response) was felt to be more appropriate for this project, because tumor downstaging to improve lumpectomy eligibility is one of the advantages of the NACT approach. Tumor shrinkage can achieve this goal without necessarily obtaining a complete pathologic response.

We applied the RS prediction model derived from Part I to the patients from Part II to determine whether a predicted high-risk RS would accurately identify patients that experienced a significant response to NACT.

Results

Part I

We identified 394 patients who had Oncotype DX testing (Table 1). Twenty-six (6.7%) patients had a high RS. Sixty percent were WA, and nearly one-third (30.4%) were AA. Mean age was 59.9 years.

Table 1 Clinicopathologic characteristics of 394 patients with available 21-gene recurrence score (RS) testing values

Patients with a high RS had significantly higher MIB1 staining but lower ER and PR expression. However, no single feature consistently predicted for a high RS. For example, two patients had very weak ER-positive staining between 1% and 10%, and one of these patients had a low RS. Similarly, 26 cases had a MIB-1 labelling index of at least 20%, and 19 of these cases (73%) had a low RS. Overall Nottingham score was higher for the high-risk RS cases (7.4 vs. 5.7 for low-risk cases; p < 0.001), and grade 1 tumors were more likely to generate low-risk scores compared with grade 3 tumors (85.1% vs. 55.5%; p = 0.002). More than half of the tumors in each of the three grade categories generated low-risk scores.

There were no significant differences in the RS distribution according to racial/ethnic identity, patient age, tumor size, extent of angiolymphatic invasion, or menopausal status. The final RS prediction model accounted for patient age, quantified estrogen and progesterone receptor expression, MIB1 staining, primary tumor size, and histopathology. Model equation and coefficients are detailed in Table 2. Interestingly, tumor size had a negative coefficient but most tumors were relatively small in this dataset. Using an example of a hypothetical breast cancer patient at age 60 years, with a 2.2-cm invasive ductal carcinoma, 65% ER expression, 70% PR expression, and MIB1 30%, the final predicted RS would be generated by multiplying the coefficient of each variable with its observed value and summing up the products with the intercept, which yields a final value of 28.3 for this sample case:

Table 2 Model predicting 21-gene recurrence score (RS) based upon clinicopathologic features
$$32.707 + 0.7\left( { - 0.107} \right) + 0.65\left( { - 0.085} \right) + 0.30 \, \left( {0.309} \right) + 60\left( { - 0.058} \right) + 2.2\left( { - 0.392} \right) = 28.3$$

While tumor grade and Nottingham score were significantly associated with Oncotype DX RS values in univariate analyses, they did not reach significance in the prediction model, suggesting that other clinicopathologic features accounted for their predictive power. The continuous predicted RS are cutoff at a threshold, so that patients with predicted scores above the threshold are classified as predicted high-risk RS group. This classification allows the calculation of sensitivity and specificity of prediction by comparing the predicted RS group with actually genotyped RS group. The ROC curve in Fig. 1 was constructed by plotting the specificity against the sensitivity at various thresholds, and the resulting area under the ROC curve is 0.909. The optimal threshold of predicted RS = 21.0, corresponding sensitivity = 0.864, and specificity = 0.821 was determined by the point closest to the perfect classification (sensitivity = 1 and specificity = 1).

Fig. 1
figure 1

Receiver operating curve for model of clinicopathologic features predicting for Oncotype DX 21-gene recurrence score being high-risk versus low/intermediate risk. Area under the curve = 0.909

Part II

We identified 56 HR-positive, Her2-negative patients who received NACT (25 AA) for tumors that were at least 2.0 cm and/or node-positive; one patient had suspected pulmonary metastatic disease that was subsequently ruled out. All but three NACT patients received at least four cycles of an anthracycline, a taxane, and an alkylating agent. Two patients received only three cycles; one patient received six cycles of a taxane and an alkylating agent. Most (n = 52) had invasive ductal carcinoma; four had lobular histology. Four (7.5%) patients had clinical stage 1, 25 (47.2%) had stage 2, and 24 (45.3%) had stage 3 disease. Patients responding to NACT (Table 3) had lower ER expression (79.8% vs. 97.7%; p = 0.0023) and lower PR expression (44.3% vs. 72.4%; p = 0.0304). Patients with grade 3 tumors were more likely to respond compared with those with grade 1 or grade 2 disease (87.5% vs. 56% and 25%, respectively; p = 0.0065). MIB1 staining was increased among responders compared with nonresponders (49.3% vs. 27.7%; p = 0.0561). No significant difference in NACT response was seen between the WAs and AAs (70.4% vs. 64%, p = 0.847).

Table 3 Clinicopathologic features of 56 patients receiving neoadjuvant chemotherapy, comparing those with minimal/no response to those that did respond (defined as tumor shrinkage less than 1 cm vs. at least 1 cm when comparing the best pretreatment size estimate to the pathologic size estimate determined from final surgical pathology)

The RS model generated from Part I was applied to a subset of 21 cases (10AA) among those that received NACT and for whom all relevant clinicopathologic features and data were available. Using the optimal threshold of 21 from Part I, the RSs were classified into a high-risk RS group and a low-intermediate-risk RS group. The high-risk RS generated by the model correctly identified patients that experienced significant tumor downsizing and response to NACT in 14 of 14 cases (100%, Table 4). Of 16 patients with predicted high-risk RS, only 2 (12.5%, Table 4) did not experience significant tumor downsizing in response to NACT.

Table 4 Comparison of predicted recurrence score risk group with response to neoadjuvant chemotherapy

Discussion

Multigene assays have allowed the oncology community to de-escalate breast cancer treatment by refining risk stratification; recommendations for adjuvant chemotherapy are now routinely tailored to tumor biology.16,17 In the United States, the 21-gene assay Oncotype DX is the most widely utilized profile.1,2 This assay generates a RS that predicts for benefit from adjuvant chemotherapy in addition to endocrine therapy for node-negative, clinically early-stage breast cancer patients found to have HR-positive and HER2/neu-negative disease. The predictive value of this assay is independent of patient age and primary tumor size. This assay influences adjuvant therapy decisions in 27–74% of cases.18

Oncotype DX RS testing is costly (approximately $3–4,000). Several investigators therefore have been motivated to develop prediction tools and algorithms based on readily available clinicopathologic features to substitute for the actual multigene assay. Examples of such tools include versions of the Magee Equation, as well as models described by Tang, Gage, and Orucevic.19,20,21,22,23,24 These various models share inclusion of HR expression and some measure of proliferative index. Harowicz performed a comparative evaluation of the Magee, Tang, and Gage models and demonstrated a common weakness of these algorithms in that they do not reliably rule out the presence of disease associated with intermediate-risk RS.25 Orucevic developed a user-friendly nomogram based upon the National Cancer Database to predict for high-risk versus low-risk recurrence and, similar to our model, found that histologic pattern was a relevant variable for inclusion.24

Another limitation in the generalizability of these models (as well as the Oncotype DX RS testing itself) is that data on application in diverse patient populations are limited. Population-based breast cancer mortality rates are higher among AAs compared with WAs, making studies of race/ethnicity-associated variation in tumor biology particularly relevant.4 Reports from the National Cancer Database, the National Comprehensive Cancer Network, and prospective clinical trial data have all shown that this outcome disparity persists within the subset of patients with HR-positive disease, even after controlling for various treatment and demographic variables.7,8,9 The Orucevic National Cancer Database model utilized a large patient population but specific information on racial/ethnic distributions were not reported. Our study therefore adds to the existing literature on Oncotype DX testing and RS prediction models by generating data based on a diverse patient population. We found no differences in the distribution of Oncotype RS in AA compared with WA cases, suggesting that reported outcome differences among HR-positive breast cancer patients are unlikely to be related to variation in disease biology as defined by the 21-gene assay.

Another goal of our project was to determine whether an RS prediction model could be used to identify HR-positive, HER2/neu-negative breast cancer patients that might benefit from tumor downstaging with NACT. Such a model would have to fulfill two different but related requirements- first, it must reliably identify patients that are likely to have a high-risk score and who will therefore benefit from chemotherapy; and second, it must reliably identify patients that are likely to exhibit a brisk response to NACT regarding primary tumor downstaging. The first issue is important, because use of tissue from core-needle biopsies for Oncotype DX RS testing is not yet routine. While some investigators have demonstrated that it is technically feasible, difficulties with obtaining adequate quantities of RNA have been reported.10,11,12,13,14 At least one study has demonstrated that core biopsy-generated Oncotype DX RSs failed to predict extent of response to NACT. The second issue can be particularly challenging, because HR-positive, HER2/neu-negative tumors tend to respond more sluggishly to NACT compared with triple-negative or HER2/neu-overexpressing tumors.26,27 These various outstanding concerns underscore the importance of ongoing work to study models that can predict RS generated by primary surgical pathology specimens and to evaluate these models for prediction of response to NACT.

Farrugia et al. evaluated the Magee Equations’ ability to predict response to NACT in 237 patients (only 7% AA) receiving NACT for estrogen receptor-positive, HER2/neu-negative/equivocal breast tumors and found that the Magee Eq. 3 performed well in predicting compete pathologic response.28 Our model is based on a more diverse patient population, and it differs from the Farrugia study in that we sought to predict response to NACT using the broader definition of tumor shrinkage by at least 1 cm. We believe that this liberal benchmark for response is appropriate, because patients do not necessarily need to achieve a complete pathologic response to reap the benefit of improved lumpectomy eligibility associated with NACT.

Our study has several limitations. First, patients triaged to receive NACT were clearly subject to selection bias. The multidisciplinary team was likely to have been biased in favor of NACT related to clinical trial eligibility or some undocumented feature indicating a preference for deferring surgical management. Also, our sample size of patients receiving NACT was relatively small.

Conclusions

We have shown that a prediction model accounting for readily available clinicopathologic features (patient age, HR expression, proliferative index) can reliably identify patients who are likely to have a high-risk Oncotype DX RS; this is consistent with other studies. Importantly, we have shown that such a model functions well in diverse patient populations and that this model can be used to predict at least partial response to NACT, which can improve lumpectomy eligibility. We do not advocate for application of this model in patients undergoing primary surgery, where tissue will be available for gene-expression profiling and recurrence-score testing. Our findings warrant validation in other neoadjuvant chemotherapy patient populations.