Introduction

Randomized trials have confirmed the safety and activity of primary chemotherapy for breast cancer. Though disease-free and overall survival are unchanged, primary chemotherapy is effective in down-staging tumors, increasing both the proportion of patients with negative lymph nodes and the ability to perform breast-conserving surgery [14]. Despite significant tumor down-staging, only a small fraction of patients obtain a pathologic complete remission; pre-treatment tissue samples are available for the majority of patients. This provides ideal opportunity to investigate expression of genes that correlate with response.

Early microarray techniques were limited by the need for fresh tissue, which is frequently not available in routine clinical practice. Though RNA could be retrieved from formalin fixed, paraffin-embedded tissue (FPE), the highly fragmented nucleic acids (<300 bases) were not suitable for classic microarray techniques. Recently, Cronin et al. developed a real-time polymerase chain reaction (RT-PCR) method suitable to the short RNA fragments obtained from FPE [5]. Gene expression profiles from FPE and frozen sections of the same cancer were highly concordant [5]. Estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) mRNA levels were comparable to their respective protein levels as determined by immunohistochemistry (IHC) [5].

Gene expression profiling, initially used to classify breast cancers into prognostic groups [69], can also be used to predict response to therapy [10, 11]. The Oncotype DX Recurrence Score, an RT-PCR-based multigene assay, measures the expression of 21 genes (16 cancer-related and 5 reference genes) to predict recurrence in tamoxifen treated patients with lymph node negative, ER positive breast cancer [12]. More importantly, the Oncotype DX Recurrence Score predicts clinical benefit from tamoxifen and chemotherapy [13, 14].

This correlative study was designed to exploit a recently completed phase II trial to determine the feasibility of gene expression profiling in FPE core biopsy samples and to explore candidate genes that correlate with pathologic complete response (pCR) in patients receiving primary chemotherapy for locally advanced breast cancer.

Patients and methods

The parent phase II trial has been previously reported [15]. Briefly patients with stage II or III primary breast cancer were treated with sequential doxorubicin (A) 75 mg/M2 every 2 weeks for three cycles and docetaxel (T) 40 mg/M2 weekly for six cycles. Treatment order (A > T versus T > A) was randomly assigned after stratification for tumor size (≤5 cm versus >5 cm) and clinical axillary node status (positive versus negative). A core biopsy was obtained prior to initiation of chemotherapy in all patients; definitive surgery was performed at the completion of all chemotherapy. Postoperative chemotherapy and radiation was administered at the discretion of the treating medical oncologist; tamoxifen was recommended for all patients with ER positive tumors. In keeping with the objectives of the parent trial, patients were not followed for recurrence or survival.

Definition of response

Pathologic complete response required no evidence of invasive malignancy in the breast and lymph node specimens at the time of definitive surgery.

Sample preparation

Three 10-μm sections were cut from each paraffin block and placed in a bar-coded microcentrifuge tube. One additional 5-μm section was cut and stained with hematoxylin and eosin (H & E). Tubes and slides were shipped to Genomic Health Inc., at ambient temperature. H & E stained slides were independently reviewed by a pathologist at Indiana University (SB) and Genomic Health (FLB) for confirmation of the submitting diagnosis and for estimation of the proportion of tissue surface area composed of invasive tumor. Only samples containing at least 20% invasive tumor were accepted for this study.

Gene expression analysis

Quantitative gene expression was determined by a multi-analyte TaqMan® RT-PCR assay that was designed to accurately measure the small fragments of RNA present in archival tumor blocks [5]. In brief, paraffin was removed from specimens by xylene extraction. RNA was first isolated then residual genomic contaminating DNA was subjected to DNase I treatment. Reverse transcription of the purified RNA was carried out using SuperScript™ II RT enzyme (Invitrogen Life Technologies, Carlsbad, CA, USA) for first strand cDNA synthesis. Then TaqMan reactions were carried out in 384 well plates, using Applied Biosystems PRISM® 7900HT Sequence Detection System.

A total of 192 genes (187 candidate genes and 5 reference genes) were tested initially [16]. We selected “candidate” genes by surveying the breast cancer literature for evidence of a significant role in cancer pathological processes, including proliferation, invasion, apoptosis, metastasis, angiogenesis, immune surveillance, tumor suppression activity, oncogene activity, differentiation status and response to chemotherapy. We focused particularly on publications which identified candidates and pathways that had been putatively correlated with response to anthracyclines and/or taxanes. We included a number of genes identified in published DNA microarray studies of breast cancer [69, 17, 18]. Expression of each gene was measured in duplicate, and then normalized relative to the five reference genes (β-actin, GAPDH, GUS, RPLPO and TFRC). Subsequently as information from other gene array studies became available [10, 11, 19], RNA extracts were re-analyzed for expression of additional genes. One hundred and forty-three genes were included in the re-analysis, 61 genes included in the initial analysis and 82 new candidate genes. As a control for assay performance, we examined the reproducibility of expression levels between the first and second analysis for the 61 genes assayed twice; concordance was excellent (data not shown).

Determination of the recurrence score

The 21 genes in the Oncotype DX Recurrence Score assay were measured according to the standardized methods used in the commercial assay. Reference-normalized expression measurements typically range from 0 to 15, where a 1 unit change generally reflects a twofold change in RNA. Briefly, as reported by Paik et al. [12], the Recurrence Score is derived from a formula based on linear and non-linear functions of the expression of four groups of genes: an ER gene group, a proliferation gene group, an invasion gene group, HER2 gene group and three other individual genes. The RS is rescaled to range from 0 to 100, where lower score is associated with lower risk of recurrence.

Other assessments

Immunohistochemistry of ER, PR, HER2/neu and Ki-67 was performed using a standard biotin-streptavidin method with the appropriate antigen retrieval method for each antibody. The anti-ER (clone 1 D5), anti-PgR (clone PgR636) and anti-Ki-67 (clone MIB-1) antibodies (all mouse monoclonal) as well as the HercepTest system for the HER2/neu detection all came from DAKO (Carpinteria, CA, USA). The immunoreactive score was used to assess ER and PR staining [20]. Tumors were defined as ER positive if ER staining of any intensity was seen in more than 10% of cells. Interpretation of the HercepTest was performed according to the criteria recommended by DAKO. Rate of proliferation was reported as the percent Ki-67-positive cells per 1,000 carcinoma cells. Tumor grade using the Bloom–Richardson criteria was assessed at Indiana University Medical Center (SB) and at Genomic Health Inc. (FLB).

Statistical methods

Demographic and baseline characteristics were summarized by descriptive statistics. The primary objective of the study was to identify genes that correlate with pCR. Generalized linear models with a logistic link function were performed to assess the correlation of gene expression measurement versus pCR. Correlation analyses of gene expression used Pearson linear correlation. Cluster analysis used 1-Pearson R squared as the distance metric and single linkage hierarchical clustering. Additionally, we examined the correlation of the Oncotype DX Recurrence Score with pCR by logistic regression. Similarly, an exploratory logistic regression analysis was performed to identify genes that correlated with IBC. As this study was exploratory, no formal hypothesis testing was performed. Because the number of candidate genes relative to the number of patients is large, we performed simulations in which we randomly shuffled the patient responses versus gene expression to estimate the number of genes that would appear to be significant (< 0.01 or 0.05) in the absence of a genuine association (false positive rate).

Results

Patient population

From June 1999 to October 2002, 70 patients were enrolled in the parent trial; 45 patients were included in this correlative study. Patients were excluded for the following reasons: no consent for gene expression analysis (n = 3), tissue could not be located or obtained from referring institution (n = 10), only scant core biopsy or nodal tissue available pre-chemotherapy yielding insufficient RNA for RT-PCR assay (n = 9) and invasive cancer in <20% of core biopsy section (n = 3).

Patient characteristics are shown in Table 1; there were no differences between the 45 patients included in our correlative study and the 25 non-eligible patients. The median age was 49 years (range 29–64). Median pretreatment tumor size was 6.8 cm with clinically positive axillary lymph nodes in 21 (47%) patients; 9 (20%) patients had inflammatory disease. A pCR was confirmed in 6 (13%) patients; 23 (51%) patients had pathologic negative nodes.

Table 1. Patient characteristics

Clinical variables and pCR

No significant correlation was observed between pCR and age (Fig. 1a), tumor size (Fig. 1b), tumor grade, ER status by IHC or randomization arm.

Fig. 1
figure 1

Relationship of age (a) and tumor size (b) with pCR

Gene expression

Gene expression units are expressed in terms of cycle threshold (CT) measurements, where each CT represents approximately a twofold change in expression. It may be shown mathematically that clinically relevant genes with a larger dynamic range of expression are more likely to demonstrate a statistically significant clinical correlation than those genes that have a smaller range of dynamic expression. The distribution of ranges of expression (expressed as standard deviations in CT) for all genes under study are provided in Fig. 2. As anticipated, the five genes used for reference-normalization purposes have small ranges of expression across samples: gene (standard deviation in CT)—β-actin (0.56), GAPDH (0.71), GUS (0.75), RPLPO (0.53) and TFRC (0.64). In contrast, the 16 cancer-related genes in the Oncotype DX Recurrence Score assay show large ranges of expression. In particular the expression of ER by RT-PCR is one of the most variable in breast cancer; all ER-related genes showed extremely large ranges of expression: gene (standard deviation in CT)—ER (2.72), PR (2.27), Bcl2 (1.41) and SCUBE2 (2.45). Using a pre-defined cutoff for quantitative ER expression that considers values greater than or equal to ≥6.5 as ER(+), 64% (29 out of 45) of the tumors in our study are ER(+) and 36% (16 out of 45) are ER(-). ER expression by IHC correlated well with ER expression by RT-PCR (Fig. 3). As expected, histologic grade correlated well with expression of proliferation genes (Fig. 4).

Fig. 2
figure 2

Dynamic range of gene expression by quantitative RT-PCR

Fig. 3
figure 3

Correlation of ER expression by IHC and quantitative RT-PCR

Fig. 4
figure 4

Correlation of tumor grade and expression of proliferation genes

Unsupervised clustering of individual tumors and genes found the expected relationships. The tumors segregated into two large groups, one group largely ER positive and the other group largely ER negative (Fig. 5). Co-expressed genes clustered as anticipated, defining an ER group (including ER, PR, SCUBE2 and Bcl2) and a proliferation group (including SURV, Ki-67, MYBL2, CCNB1 and STK15). Cluster analysis of the complete set of genes is provided in Fig. 6.

Fig. 5
figure 5

Unsupervised clustering of the 45 patients

Fig. 6
figure 6

Unsupervised clustering of genes

Gene expression and pCR

The odds ratio for pCR for each of the candidate genes with < 0.05 are given in Table 2. Three genes had a p-value <0.01, and a total of 22 genes had a p-value of <0.05. Only 13 genes would be expected to correlate with pCR at the < 0.05 level by chance alone.

Table 2. Genes correlated with pCR

The genes that correlated with the likelihood of pCR can be grouped based on co-expression. Three large groups of genes were observed, angiogenesis-related genes, proliferation-related genes and invasion-related genes. Angiogenesis-related genes that correlated with pCR include VEGF-C and ID1. VEGF-C is associated with increased lymphangiogenesis, lymph node metastases and poor prognosis in patients with primary breast cancer [2123]. ID1 is overexpressed in endothelial cells of tumor-infiltrating vessels and associated with poor prognosis in breast cancer [2426]. Four additional angiogenesis-related genes seemed related to pCR but those correlations did not achieve statistical significance: HIF1A [27], a potent inducer of VEGF expression and angiogenesis; CD31 (PECAM) [28, 29], a cell adhesion molecule frequently expressed on tumor associated vessels; VCAM-1 [30], a marker present on activated endothelial cells; and the prototypic angiogenesis factor, VEGF-A [31]. Higher expression of the angiogenesis-related genes was associated with a lower likelihood of pCR.

A number of proliferation-related genes were significantly associated with pCR, including STK15 or Aurora Kinase A, a serine/threonine kinase that is localized to centromeres [32]; C20-orf1 (TPX2), a proliferation-associated nuclear protein [33, 34]; survivin (BIRC5), a cell-cycle regulated anti-apoptotic factor [35]; PTPD1, a protein tyrosine phosphatase; and CDC20, a regulatory protein interacting with several other proteins during checkpoint arrest and release in the cell cycle. We identified several other non-significant associations with pCR, including the G2/M cyclin CCNB1, the well-known proliferation marker Ki-67, the topoisomerase TOP2A, MEM2 and Chk2. Higher expression of the proliferation-related genes was associated with a high likelihood of pCR.

We found variable relationship between invasion-related genes and pCR. Those negatively correlated with pCR include: low density lipoprotein receptor 1 (LRP1) [3638], a multifunctional endocytic receptor with an important role in regulating the activity of proteinases in extracellular matrix known to promote invasiveness of breast cancer cells in vitro; cMet [39], the hepatocyte growth factor receptor involved in metastasis and invasion of colorectal tumors; urokinase-type plasminogen activator receptor (PLAUR) [40, 41], whose ligand correlates with poor prognosis in breast cancer patients; and matrix metalloproteinase-2 (MMP2) [4244]. In contrast, the invasion-related gene matrix metalloproteinase-9 (MMP9) was positively correlated with pCR.

Expression of ER-related genes did not correlate with pCR. Similarly, we found no correlation between Oncotype DX Recurrence Score (= 0.67) and pCR. However, two of the five proliferation genes, STK15 and SURV, were significantly associated with pCR, and two additional proliferation genes, Ki-67, and CCNB1, trended toward significance.

Gene expression in inflammatory breast cancer

In an exploratory analysis, we examined genes that were differentially expressed in IBC and non-IBC. Twenty-four of 274 candidate genes correlated with the inflammatory phenotype (Table 3). Five genes were significantly upregulated (< 0.05) including: GRO1 [6, 45], a marker for the basal breast cell subtype; CD3z and CD18, immune-related genes expressed in B- or T-cells or macrophages; cIAP2 [46], a cytokine-induced anti-apoptotic protein; and DKFZp564, a still unclassified protein.

Table 3. Genes correlated with inflammatory breast cancer

Nineteen genes were significantly down-regulated (< 0.05) in IBC. Four are apoptosis-related genes: BAG1 [47], an anti-apoptotic protein that increases the function of Bcl-2; MDM2 [48, 49], the principal inhibitor of p53-mediated cell cycle arrest and apoptosis; TP53BP1 [50] and TP53BP2 [51], proteins that bind to and regulate p53 apoptotic function.

Discussion

Despite clinical response rates topping 75%, the overall pCR rate for primary chemotherapy in patients with locally advanced breast cancer is less than 30% [15, 16]. Clearly many patients have (at least partially) resistant disease and may derive little benefit from classical chemotherapy approaches. Identifying a gene profile that would predict response could allow patients to be stratified into separate prognostic categories with distinct treatment recommendations. Until recently, such gene profiles required fresh tissue, limiting widespread application. Our goal was to determine if gene expression profiles could be reliably generated from the FPE core biopsy samples routinely obtained prior to the initiation of primary chemotherapy, and to explore genes that correlated with pCR after anthracycline- and taxane-based therapy.

Our results confirm that quantitative RT-PCR analysis of gene expression FPE core biopsy samples is sensitive, reproducible and time efficient. Though gene analysis was not planned as part of the parent trial, we were able to obtain sufficient tissue from 45 (64%) of the 70 patients enrolled. Pre-chemotherapy samples could not be located or consent obtained in 13 patients. Sufficient RNA for analysis was obtained from 79% of the samples available. Results were consistent across two separate analyses with the expected clustering of patients and genes, suggesting few technical limitations to this approach. Extension of this technique to other already completed primary chemotherapy trials with archived core biopsy tissue is feasible.

The pCR rate in our study is similar to that reported by other investigators using dose dense regimens in patients with locally advanced disease [5256]. We focused our analysis on pCR rather than clinical response as previous studies have found pCR (along with post-chemotherapy nodal status) to be the most significant independent variable associated with disease-free and overall survival [17, 18]. We identified 22 genes that correlated significantly with pCR (< 0.05) compared to the 13 expected by pure chance. The 22 predictive genes contain three major predefined clusters, angiogenesis-related genes, proliferation-related genes and invasion-related genes.

The inverse association with VEGF-C and pCR is particularly striking. VEGF is a major survival factor for endothelial (and some tumor) cells by promoting expression anti-apoptotic signaling pathways including Bcl-2 [57, 58], PI3K/AKT [59], survivin and XIAP [60] while suppressing signaling through SAPK/JNK [61]. VEGF induces relative resistance to the anti-angiogenic affects of docetaxel [62] and inhibits chemotherapy-induced apoptosis of leukemic cells [63]. Similar actions have been reported for PECAM-1 [2123, 64].

Proliferation-related genes have been shown to correlate significantly with outcome in breast cancer [6567]. Patients with higher tumor cell proliferation have a better response to chemotherapy than patients with lower tumor cell proliferation [68, 69]. Proliferation is also a key predictor of relapse in patients with early stage breast cancer. Proliferation-related genes have an impact on the Oncotype DX Recurrence Score [12, 13]. Survivin Ki-67, STK15 and CCNB1 are included in the Oncotype DX panel and were found to correlate with pCR in our study (though the association with CCNB1 and Ki-67 did not reach statistical significance). A positive association between proliferation genes and pCR was reported in the Milan study as well [19]. Both the Milan study and our analysis identified CDC20 and TOP2A as important predictors of pCR. TOP2A has also been described as a predictor of response to neoadjuvant chemotherapy in several other studies suggesting that this association is unlikely to be spurious [7074].

Our study shares important similarities and differences with other studies that have sought to identify gene expression profiles that predicted response to primary chemotherapy. Using fresh tissue and the Affymetrix platform, Chang et al. identified 92 genes that predicted clinical, not pathologic, response to docetaxel monotherapy [10]. Predictive genes were grouped according to function including genes involved in cell cycle, cytoskeleton, adhesion, protein transport, modification, transcription, apoptosis and signal transduction. Though we identified genes in some of the same general categories, no single gene was identified in both studies. Ayers et al. defined a gene profile that correlated with pCR after treatment with weekly paclitaxel followed by fluorouracil, doxorubicin and cyclophosphamide (T/FAC) from pre-treatment fine needle aspiration samples. Again we identified genes from the same functional groups including proliferation and metastasis, but no individual gene was predictive of pCR in both studies [75].

One of top 22 genes we identified (CDC20), a proliferation-related gene, also was significantly correlated with pCR in the recently reported INT Milan study [18]. Of note the Milan group interrogated FPE biopsy samples using the same technical platform as in our study. Though the Oncotype DX Recurrence Score correlated with pCR in the Milan trial, we did not find this association in our study. Although the expression of ER by RT-PCR was highly dynamic and concordant with IHC, we also did not find a correlation between ER and pCR as has been reported by others [19, 21, 22].

Many of the genes we found in IBC have been previously identified. GRO1, is located at the 4q21.1 locus, along with three of the genes (CXL2, CCNG2, MASA/E-1) upregulated in IBC in a previous analysis [76]. Immune-related genes were upregulated in Bertucci’s analysis as well though the specific genes differ from those we identified. Though others have reported increased angiogenesis in IBC, we found no difference in expression of angiogenesis-related genes, particularly VEGF [77]. Bieche et al. reported increased COX2 expression in IBC [78]. Although COX2 was increased in our IBC samples, the difference did not reach statistical significance.

Differences in the genes predictive of response identified in these preliminary analyses are not unexpected. All of the studies are small and conducted multiple analyses, increasing the likelihood of spurious associations. For instance β-actin, a reference gene with a narrow range of expression, significantly correlated with pCR (= 0.0137). Though similar, the trials used different chemotherapy regimens and focused on disparate primary endpoints. Importantly, the studies used distinct technical platforms with only partial overlap in gene composition. Differences in probe sequence may also result in differing measurements for the same gene. Stec et al. compared different transcriptional profiling platforms. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient ≥ 0.7. In addition, there was substantial variation between different Affymetrix probe sets matched to the same cDNA probe. Though each platform accurately separated 91% of cases in supervised hierarchical clustering, cross-platform testing resulted in significantly lower clustering accuracy (45 and 79%) [79].

The limitations of our study must be acknowledged. The power of our analyses is severely restricted by the large number of candidate genes we tested. The likelihood that we falsely identified a gene as predicting response is high. Given the small sample size, we did not attempt multivariate analyses. No matter how striking the correlation for any individual gene, multigene profiles are likely to be more robust and reproducible. The 22 candidate genes we identified require further study, including validation in an independent patient cohort. Nonetheless, our study is an important step toward the goal of predicting response and individualizing therapy. The ability to obtain gene expression profiles from FPE core biopsy specimens makes it possible to confirm and extend our results in other completed primary chemotherapy trials.