Introduction

Topoisomerase II alpha (TOP2A) is a key enzyme in DNA replication and a target of various cytotoxic agents such as anthracyclines. The gene is located on chromosome 17q21 in close proximity to the ErbB2 locus and encodes for 170-kilodalton protein, which catalyzes the unwinding of DNA by inducing single-stranded breaks on both DNA strands. Anthracyclines, one of the most effective cytotoxic agents used in the treatment of breast cancer patients, inhibit TOP2A by trapping the DNA strand intermediates and leading to persistent DNA cleavage [1]. Several meta-analyses of breast cancer which have been reviewed very recently [2, 3] demonstrate that response to anthracycline containing chemotherapy seems to be significant better for ErbB2 positive tumors. As a simple straight forward explanation it had been previously suggested that ErbB2 amplification might in fact be a surrogate marker for co-amplification of the TOP2A gene in this setting. However, while results from in vitro analysis and several clinical studies were in line with this model other studies did not confirm this hypothesis [48]. The uncertainty regarding the biologic relationship between TOP2A protein expression, copy number, proliferation, and benefit from anthracylines makes assessment of TOP2A unreliable at this time [3]. In fact, recent trials suggest that the model of a direct relationship between TOP2A amplification, overexpression of TOP2A protein, and benefit from anthracyclines is overly simplistic [10, 11]. Thus, although there is a stronger theoretical underpinning for a relationship between the TOP2A gene and in particular its protein product and anthracycline efficacy, there are as much or more data supporting the role of HER2 in predicting differential anthracycline benefit [2, 12]. However, while not yet published, data of 2990 ErbB2 positive patients from the second interim analysis of the BCIRG 006 trial had been interpreted to suggest that those 1,057 (35%) patients with TOP2A co-amplification may not require trastuzumab in addition to an anthracycline-containing regimen [13]. The analysis of a potential predictive value of TOP2A protein expression is further complicated by the fact that TOP2A protein expression is a strong prognostic factor. Studies have shown that TOP2A protein expression as detected by immunohistochemistry is associated with ER negativity, a higher histological grade and proliferative state of the tumor as well as poor survival [1416]. While TOP2A gene amplification does not correlate with protein expression as detected by immunohistochemistry [14, 15] it is less clear whether steady state mRNA levels of the TOP2A gene might better correlate to its amplification. Here we investigated expression levels of TOP2A mRNA by Affymetrix microarray analysis in a combined large scale breast cancer cohort and its prognostic impact. Since clear cut-off levels for high and low TOP2A expression are not available we choose a conservative approach using a median split of expression values. Furthermore tumor samples were stratified according to ER, ErbB2 and Ki67 expression.

Materials and methods

Analysis of breast cancer microarray datasets

We established a database consisting of 1,681 Affymetrix microarray datasets from primary breast cancer patients without neoadjuvant treatment. We included 220 of our own samples (datasets Frankfurt and Hamburg) which have been described previously [1720] as well as 1,461 samples from nine different publicly available datasets (Table 1): Uppsala [21], Stockholm [22], Rotterdam [23, 24], Oxford-Untreated [25], Oxford-Tamoxifen and London [26], NewYork [27], Villejuif [28], and ExpO [29]. For comparability only data from Affymetrix HG-U133A microarrays were used. The clinical characteristics of the patients in the different datasets are given in Table 1. Follow up information was available for 1,363 patients. The median follow-up time was 76 months. 1,200 of the 1,681 samples were ER positive. Treatment information could be obtained for 878 ER positive and 262 ER negative patients. Since methods of Affymetrix microarray normalization can have significant effects on the levels for individual probe sets, several uniform normalization methods [30, 31] of CEL file data has been developed to allow the analysis of sets of multiple arrays. However, important discrepancies between different datasets depend on the dynamics of the measurements originating from different hybridization efficiencies. Unfortunately even uniform normalization methods are incapable in compensating those experimental differences. In addition, for some studies (e.g. the Rotterdam dataset) no CEL files are available. Therefore, in the analysis presented here we used a conservative strategy for dataset stratification by relying on a ranking of samples in each cohort. Each dataset of microarrays was normalized separately using the originally proposed method in the respective study (see Table 1). Log transformed expression values were median centered over each array. For genes the normalization, ranking of expression values and median splits were done separately in each dataset.

Table 1 Clinical characteristics of breast cancer patients from Affymetrix microarray datasets used in this study

Assessment of ER, ErbB2, proliferative status and TOP2A expression of the samples

To allow comparison of different datasets and since standard pathology for ER and ErbB2 was not available for all samples, receptor status was determined based on Affymetrix expression data as previously described [3235]. The estrogen receptor status was determined using Affymetrix probe set 205225_at, the ErbB2 status using Affymetrix probe set 216836_s_at. A specificity of 86.1% and a sensitivity of 92.2% was observed when the chip based ER status was compared to immunohistochemical obtained ER status (available for 1,333 samples), while the specificity and sensitivity of chip based ErbB2 status was 98.6% and 45.8%, respectively, compared to 3+ staining in immunohistochemistry with HER2 antibody (data available for 206 samples). As a surrogate marker for cellular proliferation we used the expression of the proliferation marker Ki67 (ProbeSets 212020-212023_s_at). Appropriate cut off values that distinguish between high and low proliferative activity in a clinically relevant manner using Ki67 immunohistochemistry in breast cancer have not been universally established [36]. Thus, a conservative median split according to Ki67 gene expression was applied which corresponds to a percentage of MIB-1 positive cells of 16–17% [37]. To allow comparison of TOP2A expression between different datasets we used a median split of each dataset according to TOP2A Affymetrix data (ProbeSets 201291_s_at, 201292_at). An exploratory analysis revealed a significant higher TOP2A expression in ER negative cancers. Thus to avoid a confounding effect by the ER status in further analyses only ER positive tumor samples were used (see Results). For those analyses the median split of TOP2A was applied only to the ER positive subgroup to prevent a confounding effect of the relative proportions of ER positive and negative tumors in the different datasets.

Statistical analysis

Subjects with missing values were excluded from the analyses and all reported P values are two sided. P values of less than 0.05 were considered to indicate a significant result. Chi-square test was used for categorical parameters. Survival intervals were measured from the time of surgery to the time of death from disease or of the first clinical or radiographic evidence of disease recurrence. Data for women in whom the envisaged end point was not reached were censored as of the last follow-up date or at 120 months. We constructed Kaplan–Meier curves and used the log rank test to determine the univariate significance of the variables. A Cox proportional-hazards regression model was used to examine simultaneously the effects of multiple covariates on survival. The effect of each variable was assessed with the use of the Wald test and described by the hazard ratio, with a 95 percent confidence interval. The model included age, tumor size, lymph node status, ER, ErbB2, Ki67 as well as TOP2A expression. All analyses were performed using SPSS 15.0 (SPSS Inc., Chicago, IL).

Results

Analysis of TOP2A Affymetrix expression data in a combined cohort of 1,681 breast cancers

Gene expression values of n = 1,681 patients from 11 different datasets were analyzed (see Table 1). TOP2A is represented by two different ProbeSets on the Affymetrix HGU133A microarray. Figure 1a displays the high correlation of the expression values from the two ProbeSets in a scatter plot of the measurements from the combined cohort. Thus, in subsequent analyses the mean of both ProbeSets was applied. An initial exploratory analysis revealed a significant higher expression of TOP2A in ER negative breast cancers (n = 481) compared to ER positive tumors (n = 1200) (Mann Whitney P < 0.001, Fig. 1b). In addition ErbB2 positive tumors displayed a higher expression of TOP2A in the ER positive subgroup. However the latter was not observed when comparing ER negative tumors which are ErbB2 positive and negative, respectively (Fig. 1c). The ER status had a larger impact on the distribution of TOP2A expression (see Supplementary Figure S1). Thus, to avoid a confounding effect of the ER status we subsequently used separate median splits of TOP2A values for ER positive and negative cohorts, respectively.

Fig. 1
figure 1

(a) Scatter plot of TOP2A expression values obtained from two different ProbeSets on the Affymetrix HG-U133A microarray for 1681 samples. (b) Difference in TOP2A expression between tumors with positive and negative ER status, respectively. (c) Comparison of TOP2A expression of ErbB2 positive and negative tumors stratified by ER status

Analysis of the prognostic impact of TOP2A expression in ER positive and negative breast cancers

ER negative tumors showed no difference in survival when stratified according to high or low TOP2A expression (P = 0.35) even when they were analyzed separately with respect to their ErbB2 status (data not shown). In contrast, as shown in Fig. 2, a highly significant difference in disease free survival (DFS) was observed for ER positive breast cancers (n = 994 with follow up) when they were stratified according to TOP2A expression. While the 5 year DFS of patients with ER positive tumors with low TOP2A expression was 83.8 ± 1.7%, that of patients whose ER positive tumors displayed high TOPO2 expression was only 67.0 ± 2.0% (P < 0.001).

Fig. 2
figure 2

Prognostic value of TOP2A expression in ER positive and negative tumors. (a) Diseases free survival of ER positive breast cancers (n = 994) stratified by a median split of TOP2a expression (for 206 of 1200 patients no follow up data were available, see Table 1). (b) Disease free survival of ER negative breast cancers (n = 369) stratified by a median split of TOP2a expression (for 112 of 481 patients no follow up data were available, see Table 1)

Correlation of TOP2A expression with clinical characteristics among ER positive breast cancers

Due to the worse impact of TOP2A expression on the prognosis of patients with ER positive tumors we further analyzed the clinical characteristics of those patients. The clinical parameters of ER positive breast cancers stratified according to high and low TOP2A expression are presented in Table 2. High TOP2A expression is associated with larger tumor size (χ2-test, P < 0.001), node positive disease (P = 0.042), poor histological grading (P < 0.001) and high proliferative activity (P < 0.001). Moreover as shown above (see Fig. 1c) a positive correlation of TOP2A and ErbB2 expression was observed for ER positive tumors which might be due to an co-amplification of the 17q21 amplicon. 71.3% of the ErbB2 positive samples were found in the group with high TOP2A expression. However, these ErbB2 positive samples represent only 82/603 (13.6%) of all tumors in the TOP2A high group and those tumors with highest TOP2A expression are not identical to the ErbB2 positive subset suggesting that TOP2A expression is not exclusively caused by 17q21 amplification.

Table 2 Clinical parameters of ER positive breast cancers stratified according to high and low TOP2A expression

Prognostic value of TOP2A among ER positive breast cancers with high and low proliferation

A median split of the samples according to Ki67 expression was used as a surrogate marker for stratifying tumors into groups with high and low proliferation. TOP2A expression was associated with higher expression of Ki67. 76% of tumors with Ki67 above the median did also show high TOP2A expression (Table 2, P < 0.001). For both subgroups of tumors according to Ki67 expression a significant better prognosis for those patients with low TOP2A expression was obtained (Ki67 low: P = 0.002; Ki67 high: P = 0.007, Fig. 3). This suggests that TOP2A expression adds prognostic information to Ki67 in ER positive breast cancers.

Fig. 3
figure 3

Prognostic value of TOP2A expression in ER positive tumors with high and low proliferation, respectively, as measured by Ki67 expression. (a) Disease free survival of ER positive breast cancers with Ki67 below the median (n = 494) stratified by TOP2a expression. (b) Disease free survival of ER positive breast cancers with median Ki67 or above (n = 500) stratified by TOP2a expression

Prognostic value of TOP2A among ER positive breast cancers stratified by ErbB2 status

Of the 994 ER positive tumors with follow up data 97 (9.8%) were classified as ErbB2 positive. Within this subgroup 70.1% (n = 68) also displayed high TOP2A expression in contrast to only 47.9% among ErbB2 negative tumors (P < 0.001). While the prognostic value of TOP2A was highly significant among ErbB2 negative tumors (P < 0.001; Fig. 4a), there was only a trend to significance among the ER+/ErbB2+ tumors (P = 0.11; Fig. 4b) presumably due to the much smaller sample size.

Fig. 4
figure 4

Prognostic value of TOP2A expression in ER positive tumors according to ErbB2 status. (a) Disease free survival of ErbB2 negative, ER positive breast cancers (n = 897) stratified by TOP2a expression. (b) Disease free survival of ErbB2 positive, ER positive breast cancers (n = 97) stratified by TOP2a expression

Prognostic value of TOP2A is independent of treatment

For 878 of the 994 ER positive tumors with follow up data adjuvant treatment information was available. 405 patients had no further adjuvant treatment after surgery and 473 patients received adjuvant treatment. 70% of the adjuvant treated patients received only endocrine treatment and only 143 underwent adjuvant chemotherapy mostly CMF without anthracycline. When samples were stratified according whether the patients had received any adjuvant treatment of no adjuvant treatment at all survival analysis revealed that TOP2A expression is prognostic both in untreated and adjuvant treated patients (P < 0.001 for both, Fig. 5). There were too few patients treated with anthracyclines in the cohort for a profound analysis of a predictive value TOP2A for response to anthracycline therapy.

Fig. 5
figure 5

Prognostic value of TOP2A expression in ER positive tumors according to adjuvant treatment. (a) Disease free survival of ER positive breast cancers without any adjuvant treatment (n = 405) stratified by TOP2a expression. (b) Disease free survival of adjuvant treated ER positive breast cancers (n = 473) stratified by TOP2a expression

Multivariate Cox regression analysis

In univariate analysis TOP2A displayed a hazard ratio (HR) of 2.23 (95% CI 1.75–2.84, P < 0.001) for disease recurrence. To compare the prognostic value of TOP2A with standard parameters (tumor size, nodal status, grading, age, Ki67 and ErbB2 expression) a multivariate Cox regression analysis was performed using n = 541 patients for which all parameters were available. The result of this analysis is presented in Table 3. TOP2A emerged as the strongest prognostic marker for disease free survival (HR 2.40, 95% CI 1.68–3.43, P < 0.001) beside tumor size (HR 0.48, 95% CI 0.34–0.69, P < 0.001) and ErbB2 status (HR 1.90, 95% CI 1.19–3.02, P = 0.007).

Table 3 Multivariate Cox regression analysis of standard parameters and TOP2a expression in relation to disease free survival among ER positive breast cancers

Discussion

ErbB2 amplification has been corroborated as a predictive factor for response to anthracycline containing chemotherapy [2]. Since the topoisomerase II alpha protein is a direct target for anthracyclines in vitro it has been suggested that ErbB2 is only a surrogate marker for co-amplification of the TOP2A gene. However, expression of TOP2A protein product as measured by immunohistochemistry does not correlate with amplification of the TOP2A gene but rather with cellular proliferation [14]. Fritz et al. [15] observed an overexpression of TOP2A protein using immunohistochemistry in approximately 25%, whereas overexpression of both c-ErbB2 and TOP2A could be detected in only 9.3% of breast cancers. TOP2A and c-ErbB2 were found to be overexpressed in overlapping but distinct subgroups of patients. Moreover, the authors found a prognostic impact of TOP2A protein, which was confined to the subgroup of hormone receptor positive patients. It could be suggested that immunohistochemistry is inaccurate in measuring quantitative steady state levels of TOP2A in tumors with or without Top2a-gene-amplification. Unlike the expression of ErbB2, TOP2A expression is highly regulated at transcriptional and translational level [38, 39], suggesting that gene amplification may not have a profound effect on the total amount of TOP2A protein in the cell. Here we could show in a large cohort that TOP2A mRNA expression levels as detected by microarray analysis are also strongly associated with cellular proliferation, ER negativity and a worse prognosis as has been shown for TOP2A protein expression [1416]. Our data clearly demonstrate that TOP2A expression as measured by microarray analysis has a high prognostic value in ER positive breast cancer patients. Multivariate analysis of standard parameters and TOP2A in ER positive breast cancers revealed, that high TOP2A and ErbB2 expression, as well as tumor size remain the only independent parameters for predicting poor survival. In contrast TOP2A expression was not a prognostic factor in ER negative tumors which is in perfect agreement with the data of Fritz et al. [15], who could demonstrate similar results by using immunohistochemistry. For this reason we further analyzed the subgroup of ER positive breast cancers. Mueller et al. [14] had demonstrated that TOP2A protein expression is strongly correlated with Ki67 expression. Our data support this observation since higher expression values of TOP2A were associated with Ki67 gene expression (Supplementary Figure S2). However, TOP2A expression maintained its strong prognostic significance in both groups with high and low Ki67 expression. In this context it should be considered that cell cycle activity and DNA ploidy may have different implications with regards of tumor response to chemotherapy [40]. Hannemann et al. could not observe a significant benefit from anthracycline containing high dose chemotherapy in breast cancer patients with TOP2A amplification [41] which adds to our data in the neoadjuvant setting [12].

Regarding a possible predictive value of TOP2A our data demonstrate that TOP2A remains a strong prognostic marker in untreated and adjuvant treated patients. All these observations raise the question if prediction of anthracycline sensitivity is rather caused by higher proliferation (associated with high levels of TOP2A RNA) than TOP2A amplification or if the predictive value in terms of anthracycline sensitivity might be induced by other markers adjacent to the 17q21 amplicon. To date no concise data with regards to these topics are available.

In conclusion, this large scale analysis of gene expression data in ER positive breast cancer patients demonstrates a strong prognostic impact of TOP2A expression, outperforming standard parameters as tumor size, nodal status, grading, age and ErbB2. Our data show that this marker could be helpful for risk assessment in ER positive breast cancer patients. A prospective evaluation and standardized method of measuring TOP2A would be useful to verify these observations.