Introduction

Breast cancer is a heterogeneous disease with a demonstrated difference in prognosis based on molecular phenotypes. Neoadjuvant chemotherapy (NAC), also called as preoperative systemic chemotherapy, has emerged as the preferred initial component of therapy for patients diagnosed with locally advanced breast cancer. However, a potential disadvantage of NAC is the loss of prognostic value provided by tumor size and nodal status at surgery and before adjuvant chemotherapy (Estevez and Gradishar 2004; Shimizu et al. 2007). Many researchers have attempted to perform risk stratification and individualized treatment according to molecular phenotypes.

In early breast cancer, the useful prognostic predictors such as the Adjuvant! online (Ravdin et al. 2001) (http://www.adjuvantonline.com) or the Nottingham Prognostic Index (Galea et al. 1992) have been built, validated, and widely used in clinic. However, in locally advanced breast cancer treated with NAC, tumor size and the number of involved lymph nodes can change substantially during chemotherapy and this prevents us using such a prognostic predictors for early breast cancer.

To date, there are several prognostic models or nomograms to predict the outcome of the breast cancer patients receiving NAC (Colleoni et al. 2009; Jeruss et al. 2008a; Jones et al. 2009; Rouzier et al. 2005, 2006; Symmans et al. 2007). A prognostic predictor for locally advanced breast cancer treated with NAC has not yet implemented and validated, especially in Asian patients. Furthermore, tumor biology of the Asian breast cancer patients is somewhat different from those of the Westerns (Yip 2009; Yoo et al. 2002). Peak age of Asian patients is between 40 and 50 years, while that of Western countries is between 60 and 70 years (Yoo et al. 2002). There is a higher proportion of hormone receptor-negative patients, and some evidence that the breast cancers in Asia are of a higher grade (Yip 2009). The aim of this current study was to combine clinical pathologic variables that are associated with pathologic completer response (pCR) and relapse-free survival (RFS) after NAC into prediction nomograms.

Patients and methods

Study population

Recently, we conducted neoadjuvant docetaxel/doxorubicin chemotherapy in stage II or III breast cancer and reported the prognostic and predictive role of the molecular markers (Keam et al. 2009, 2007, 2011). The detailed eligibility criteria and regimen were described in our prior reports (Keam et al. 2009, 2007). In brief, the eligibility criteria were (1) pathologically confirmed breast cancer by core needle biopsy, (2) initial clinical stage II or III, (3) objective measurable lesion, (4) Eastern Cooperative Oncology Group performance status 0–2, (5) treatment naive, and (6) adequate bone marrow, hepatic, cardiac, and renal functions. The patients received three cycles of neoadjuvant docetaxel/doxorubicin chemotherapy. After three cycles of neoadjuvant chemotherapy, the patients were reevaluated for response and underwent curative surgery. Then, the patients received 3 cycles of docetaxel/doxorubicin chemotherapy as an adjuvant. Between January 2002 and September 2008, a total of 370 consecutive patients who received neoadjuvant docetaxel/doxorubicin chemotherapy at Seoul National University Hospital were included in the present study.

We performed immunohistochemistry (IHC) using tissues obtained before treatment and evaluated the association with clinical outcomes. IHC was performed as previously described (Keam et al. 2007; Lee et al. 2007). The estrogen receptor (ER) and progesterone receptor (PR) positivity was defined as ≥10% positive tumor cells with nuclear staining. Human epidermal growth factor receptor 2 (HER2) positivity was defined as either HER2 gene amplification by fluorescent in situ hybridization or scored as 3+ by IHC. The clinical and pathologic stage was assessed according to the 6th edition of the AJCC cancer staging system (Greene et al. 2002).

We selected a pCR and RFS as clinical outcome variables for nomograms. The pCR was defined as complete disappearance of invasive carcinoma in both breast and axillary lymph nodes after three cycles of chemotherapy. Residual ductal carcinoma in situ was included in the pCR category (Mazouni et al. 2007). RFS is determined as the interval between the NAC and the date when disease relapse is first documented, or the date of death from any cause. This study was reviewed and approved by the Institutional Review Board of Seoul National University Hospital.

Constructing a model

We built the nomograms using logistic regression model for binary variable and Cox proportional hazard regression (PHR) model for survival data (Iasonos et al. 2008). Beta-coefficients from the model were used to allocate points. Univariate logistic regression analyses were used first to assess the association between each variable and pCR and to select variables, which entered following multivariate logistic regression analysis. Univariate Cox PHR analyses were performed to evaluating prognostic values of each variable, followed by multivariate Cox PHR analysis. Multicollinearity between variables was also tested, and one of the variables which showed multicollinearity was removed in the model. The final multivariate model was chosen on the basis of the stepwise procedure as well as consideration of the clinical or biologic importance of the variables in the model. Based on the prediction model with identified predictive and prognostic factors, a nomograms were drawn for prediction of pCR and 2 year RFS.

Evaluating model performance

The model performance was quantified in terms of the discrimination and calibration performance. Discrimination is the predictor’s ability to separate patients with different responses or events. Calibration is the agreement between observed outcomes frequencies and predicted probability produced by the model (Harrell et al. 1996).

Discrimination for binary data was evaluated using the area under a receiver operating characteristic (ROC) curve (Hanley and McNeil 1982). Discrimination for survival data was evaluated using the C statistic with concordance index (C-index) (Harrell et al. 1996), which is similar in concept to the area under ROC curve in the logistic model, but appropriate for censored data (Nam 2000; Pencina and D’Agostino 2004). The concordance index provides the probability that, given two randomly selected patients, the patient with the worse outcome will in fact have a worse outcome prediction. The C-index ranges from 0 to 1, with 1 indicating perfect concordance, 0.5 indicating no better concordance than chance, and 0 indicating perfect discordance. In general, the model is considered relatively good for values above 0.75. ROC curve for survival data was drawn using the methods proposed by Heagerty et al. (2000).

Calibration for binary variable compared the predicted probability of pCR and observed frequencies of pCR. The grouped proportions versus mean predicted probability in groups defined by quantile and the logistic calibration were represented. Calibration plot can be approximated by a regression line with intercept α and slope β. These parameters can be estimated in a logistic regression model with linear predictors. Calibration ability was evaluated using the Hosmer–Lemeshow goodness of fit test, grouping of the data into deciles using a classification table (Hosmer and Lemeshow 1999; Lemeshow and Hosmer 1982). Calibration for survival data compared the predicted probability of RFS at 2 years with actual survival, using deciles. Patients were grouped according to their nomogram predicted probabilities for relapse into 10 groups. We used the bootstrapping resampling method (200 repetitions) to obtain relatively unbiased estimates and to check interval validation.

All reported P values are two-sided. All statistical analyses were performed using STATA statistical software version 11.0 (STATA, College Station, TX, USA) and R software version 2.10.1 (http://www.r-project.org). R package with the design, and survcomp libraries (available at URL: http://cran.r-project.org/web/packages/) was used (last accessed on January 12, 2010).

Results

Table 1 shows the baseline characteristics of the 370 patients. Median tumor size was 4.5 cm, and pCR was observed in the 8.6% of the patients. With median follow-up duration of 34.8 months, 89 relapse events were happened. Estimated 1-, 2-, and 3-year RFS rate, as calculated by the Kaplan–Meier method, were 90.8, 81.7, and 73.3%, respectively.

Table 1 Baseline characteristics of 370 patients

Nomogram predicting pCR

The nomogram for predicting pCR was constructed using logistic regression model. Table 2 shows the results of univariate and multivariate logistic regression analysis. The Hosmer–Lemeshow goodness of fit test was not significant (degree of freedom = 8, χ2 = 5.02, P = 0.756), indicating good fit. Figure 1 shows the nomogram predicting pCR. Figure 2 demonstrates ROC curve and calibration plot. Area under the ROC curve of the multivariate model was 0.830 [95% confidence interval (CI): 0.760–0.899]. The calibration plot showed good agreement between predicted and observed outcomes in low probability area. However, in high probability area, it seems not to show a good agreement, because the observed pCR rate of 8.6% was relatively low.

Table 2 Univariate and multivariate logistic regression analysis—between clinicopathologic variables and pathologic complete response (pCR)
Fig. 1
figure 1

Nomogram to predict the probability of pathologic complete response (pCR) to neoadjuvant docetaxel/doxorubicin chemotherapy. The nomogram is used by totaling the points identified on the top scale for each independent covariate. The total points projected to the bottom scale indicate the % probability of pCR

Fig. 2
figure 2

a Receiver operating characteristic (ROC) curve corresponding to the multiple logistic model. Area under the ROC curve is 0.830 [95% confidence interval (CI):0.760–0.899]. b Calibration plot of the nomogram for probability of pCR. Predicted and observed pCR rates are plotted as logistic calibration (bootstrap 200 repetitions)

Nomogram predicting RFS

The nomogram for predicting RFS was constructed using Cox PHR model. Table 3 shows the results of univariate and multivariate Cox PHR analysis. A nomogram for predicting RFS was constructed (see Fig. 3). The nomogram can assign numeric predictions for the risk of relapse at 2 years. Its C-index was 0.781 (95% CI: 0.735–0.827), and it appeared to be accurate. Figure 4 demonstrates ROC curve and calibration plot. The Hosmer–Lemeshow goodness of fit test was not significant (degree of freedom = 9, χ2 = 10.623, P = 0.302), indicating good fit.

Table 3 Univariate and multivariate Cox proportional hazard regression analysis—between clinicopathologic variables and relapse-free survival (RFS)
Fig. 3
figure 3

Nomogram predicting relapse-free survival (RFS)

Fig. 4
figure 4

a ROC curve of Cox proportional hazard regression model. Harrell’s C-index was 0.781 (95% CI: 0.735–0.827). b Calibration plot for 2 year RFS from nomogram. On the calibration plot, the x-axis is nomogram predicted probability of RFS. The y-axis is observed RFS and 95% CI calculated using Kaplan–Meier analysis

Discussion

In the present study, we developed the nomograms to predict pCR and RFS in breast cancer patients receiving neoadjuvant docetaxel/doxorubicin chemotherapy. These predictive and prognostic models were internally validated and showed a good model performance in terms of calibration and discrimination. We believe that these user friendly nomograms are useful for risk assessments and would be the basis for individualized risk-adaptive therapy.

Up to date, several prognostic models (Jeruss et al. 2008a; Jones et al. 2009; Symmans et al. 2007) and three nomograms (Colleoni et al. 2009; Rouzier et al. 2005, 2006) have been reported in breast cancer patients receiving NAC. (Rouzier et al. 2005) developed the nomogram predicting pCR, which was taken into account ER, initial clinical T stage, histologic grade, age, and number of courses as covariates. These covariates were similar those of our model. However, we did not include histologic grade (HG), because missing data would have reduced statistical power. Age was not independent predictive factor for pCR in our results (data not shown). Colleoni et al. (Colleoni et al. 2009) built the prognostic nomogram for disease-free survival (DFS), using covariates of residual tumor size, ER, HER2, Ki67, and vascular invasion. Rouzier et al. (Rouzier et al. 2005) developed the nomogram predicting DFS based on the covariates of residual tumor size, number of involved node, HG, ER, and histology type. Histology type was not prognostic factor for RFS in our results (data not shown). Differing from two prognostic nomograms (Colleoni et al. 2009; Rouzier et al. 2005), we included initial clinical stage and age less than 35 into prognostic nomogram. To predict axillary lymph status after NAC, several nomograms (Barranger et al. 2005; Kohrt et al. 2008; Van Zee et al. 2003) were also developed and (Unal et al. 2009) compared the model performance of each nomogram. These results were basis for more precise prediction for prognosis, and it is necessary to validate and compare published nomograms.

We examined initial clinical stage exactly using breast magnetic resonance image and chest computed tomography. Initial clinical stage was regarded as prognostic factor in neoadjuvant setting (Keam et al. 2007; Jeruss et al. 2008a, b) proposed simple new prognostic scoring system using both initial clinical and pathologic stage. By adding initial clinical stage to prognostic model, model performance was good (C-index = 0.781). In our results, the patients with higher Ki67 were significantly more likely to achieve a pCR than those with lower Ki67. However, higher Ki67 was associated with shorter RFS, and these findings were in line with the result of Jones et al. (Jones et al. 2010).

To date, there is no consensus regarding which is optimal postoperative treatment after completion of NAC including optimal indication of radiation therapy. Even though there is no definite evidence that more additional chemotherapy will be of value in neoadjuvant setting (Gralow et al. 2008), the concerns arise that some specific patients would be benefit from additional chemotherapy such as one or 2 year adjuvant trastuzumab in HER2 over expression patients without achieving pCR. Hence, we believe that our nomogram predicting RFS might give guidance for identifying the patients with high risk of relapse.

The present study has some limitations. First, our nomograms were not validated externally using independent patients set. We performed only internal validation using bootstrap resampling method. Whether it can be generally applied is still to be validated externally (Iasonos et al. 2008), and future efforts are needed to test our model performance in independent patients set. Second, in our study, pCR rate of 8.6% was relatively lower comparing with other NAC results, because we gave only 3 cycles of NAC. Thus, use of nomogram predicting pCR might not be appropriate in breast cancer patients receiving 6 or 8 cycles of NAC. Despite of these limitations, the present study firstly developed the nomogram based on data of Asian patients, which show different epidemiology and tumor biology (Yip 2009; Yoo et al. 2002).

In conclusion, we developed nomograms that can be used to predict the probability of pCR and the risk of relapse at 2 years, based on clinicopathologic characteristics. These nomograms may be useful when predicting response to NAC and determining additional postoperative treatment. We suggest that these nomograms allow individualized outcome prediction, which could aid clinicians in decision making process.