Introduction

All cancers carry a complex repertoire of nucleic acid variations including germ line polymorphisms and somatic mutations (http://www.sanger.ac.uk/genetics/CGP/Census) that can influence the clinical behavior of the disease and may represent potential therapeutic targets [1, 2]. The mutation landscape of various types of cancers is currently being mapped with the expectation that some of the genomic alterations will turn out to be important therapeutic targets [3]. Small scale, early breast cancer studies that used whole or partial genome sequencing, revealed mutations in dozens of potentially cancer-causing genes that often appeared to be unique to a given case [49]. The number of patients included in these studies ranged from 1 to 16. The two largest mutation profiling studies in breast cancer tested around 30 oncogenes and included 183 and 53 cases, respectively, therefore the true prevalence and pattern of mutations in breast cancer subtypes remains uncertain [10, 11]. Also, the aforementioned studies provided no clinical and pathological correlations or examined association between mutations and response to treatment.

The purpose of the current study was to assess the prevalence of a selected number of somatic mutations and functional SNP in 267 stage I–III breast cancer samples with known clinical outcome. These genes and nucleic variations were selected because of their known functional importance in cancer biology and because they represented molecules or signaling pathways that can be targeted with approved or investigational drugs currently in clinical trials. We analyzed fine needle aspirations (FNA) of stage I–III breast cancers collected before neoadjuvant chemotherapy in the context of two prospective and IRB-approved clinical trials. Importantly, FNA specimens are highly enriched in neoplastic cells with a few contaminating leukocytes and minimal or no stromal (fibroblast, adipocyte, and endothelial cell) contamination; therefore, our results likely represent mutational status of the neoplastic cells in these cancers [12]. Mutation analysis was performed using Sequenom technology [13]. There were no normal tissues included in this analysis, somatic nucleic acid variations were retrieved from the public dbSNP database for comparison.

Materials and methods

Tumor specimens and nucleic acids extraction

FNA samples of newly diagnosed stage I–III breast cancers were obtained with a 23-gauge needle, and cells from two passes were collected into a vial containing 1 ml of RNAlater solution (Ambion, Austin, Texas). The vials were stored at −80 °C until total RNA and DNA extraction. One hundred seventy-five biopsies were collected at The University of Texas M.D. Anderson Cancer Center (MDACC, Houston, Texas), 76 specimens were received from US Oncology, Inc. (USO, Houston, Texas) and 16 specimens were collected at the Instituto Nacional de Enfermedades Neoplásicas (INEN, Lima, Peru). Results of these clinical trials were reported separately [14, 15]. Patient characteristics are presented in Table 1.

Table 1 Patients’ characteristics at diagnosis

All clinical information including survival and recurrence dates were extracted from the prospectively collected and maintained clinical trial information databases at MDACC (search date: June 28, 2010) and USO (search date: July 7, 2010), respectively. Pathologic complete response (pCR) was defined as no residual invasive cancer in the breast or lymph nodes. All other cases were categorized as residual cancer (RD). Recurrence-free survival estimates were calculated from the date of the diagnostic biopsy until first recurrence. Overall survival was calculated from the date of the diagnostic biopsy until the date of death. Clinical ER and HER2 status were determined in a diagnostic core biopsy by the routine clinical pathology laboratories at the site of the biopsy collection. ER positivity was defined as nuclear staining in ≥10 % of cancer cells and HER-2 overexpression was defined as a gene copy ratio of HER-2 gene:chromosome 17 centromere ≥2.0 or an IHC score of 3+. The HER-2 status of tumors with an IHC score of 1+ or 2+ were confirmed using FISH.

DNA was extracted from the flow-through specimens of a preceding RNA extraction step that used RNeasy mini kits (Qiagen, Valencia, California). DNA extraction was performed using the QIAamp DNA Mini Kit (Qiagen) following the manufactures instructions. Final DNA concentration was assessed using Nanodrop 2000 Spectrophotometer (Thermo Fisher Scientific Inc, Pittsburgh, Pennsylvania).

Detection of mutations and single nucleotide polymorphisms

We analyzed 163 somatic mutations and single nucleotide polymorphisms (SNP) in 28 cancer-related genes (Supplementary Table S1) in 267 stage I–III breast cancer samples. These nucleic acid variations were selected from public databases including COSMIC (http://www.sanger.ac.uk/genetics/CGP/cosmic/) and the NCI somatic mutation data bases http://tcga-data.nci.nih.gov/docs/somatic_mutations/tcga_mutations.htm). We based our selection criteria on nonsynonymous coding mutations that were reported to occur in human cancers and are considered to be functionally relevant as well as potentially druggable with currently available investigational already approved drugs.

We used Sequenom technology to identify known single nucleotide variations. DNA regions flanking the selected mutation or SNP sites were amplified by PCR in a 5 μl multiplex reaction using Qiagen Hotstar Mastermix supplemented with 5 mM MgCl2 (Qiagen, Valencia, California). Unincorporated primers and deoxyribonucleotide triphosphates (dNTPs) were removed using ExoSAP available as part of the Sequenom iPLEX Gold kit (Sequenom, San Diego, California). Primer extension reactions were performed using the iPLEX Gold Taq, and the sample was desalted using Clean Resin (Sequenom). PCR and extensions primers were designed by the Sequenom Assay Design Software then checked for false priming and common SNP sites with proprietary software available on http://www.mysequenom.com web site. The desalted primer extension reactions were spotted onto SpectroChip II and run in the Sequenom MassARRAY. All spectra corresponding to sequence data were generated in duplicate on the same run and were interpreted by the Typer Software 3.4 and 4.0. In-house software was used to compare duplicate and to calculate the relative amount of each base. Reactions with >15 % of the detected mass corresponding to the mutated allele were called positive for a nucleic acid variation. When duplicate cells were discordant, the spectra were visually inspected, if the quality of one spectrum was bad, the call was based on the single good spectrum; if both spectra were good but discrepant, the call was discarded from further analysis.

Allelic discrimination by quantitative real- time polymerase chain reaction (PCR) and automatic sequencing

To validate mutations identified by Sequenom and to avoid false positive call, we performed allelic discrimination qRT-PCR (i.e., EGFR, KIT, and PHLPP2) or direct sequencing (for all the other genes) for all mutant alleles. Polymorphisms in DNA sequence were assessed by the 5′ nuclease allelic discrimination assay or by direct sequencing. For allelic discrimination PCR, the region flanking the polymorphism was amplified by PCR in the presence of two fluorescent probes, each specific for one allele. The primers were designed using Primer Express 3.0 (Applied Biosystems, Foster City, California) with attention to avoid SNPs in the flanking region of the mutation. Each PCR mixture (12.5 μl) contained 20 ng DNA, 900 nM of forward and reverse primer each, 300 nM of allele specific probes each, and 6.25 μl of TaqMan Universal PCR Master Mix (Applied Biosystems). Amplification was done under the following conditions: 50 °C for 2 min, 95 °C for 10 min followed by 40 cycles of 92 °C for 15 s, and 60 °C for 1 min. Fluorescence in each sample well was measured before and after PCR using ABI Prism 7900HT Sequence Detection System (Applied Biosystems). Data were analyzed using Allelic Discrimination Program (Applied Biosystems).

Automatic sequencing was performed using an ABI 3730xl DNA Analyzer (Applied Biosystems). For each mutation, DNA samples were genotyped twice to confirm the results. Data were analyzed by Applied Biosystems Seqscape v2.6 software, which defines a heterozygous mutation as a minor variant peak ≥25 % of the intensity of the dominant peak.

Statistical analysis

Individual nucleic acid variations in a given gene were collapsed into gene level mutation status, and a gene was considered mutated if it had any nucleic acid variation. Genes with mutations in at least one sample were also collapsed into three functional pathway groups including: (i) phosphatidylinositol-3-kinases/v-akt thymoma viral oncogene homolog 1 (PI3K) pathway (AKT-1,-2,-3, FRAP, PDK1, PHLPP2, PIK3CA, PI3KR1, RICTOR, and PRKAG-1), (ii) receptor tyrosine kinase (RTK) group (EGFR, FGFR-1, KIT, IGF1R, MET, RET, and PDGFR-α), and (iii) Cell cycle/metabolic pathway group (BRAF, CTNNB1, DEAR1, FBXW7, HIF1-α, IDH-1, -2, JAK2, KRAS, and TNK2). At subsequent analysis, a mutation or the presence of the activating polymorphism in PHLPP2 in any one of the member genes qualified a patient for inclusion in that mutated pathway category. Mutation frequencies at allele, gene, and pathway levels were compared across three clinical subsets of breast cancers including: (i) ER-positive/HER2-normal (n = 88), (ii) HER2-positive with any ER status (n = 61), and (iii) ER-negative/HER2-negative (triple negative, TNBC, n = 116) using the Fisher’s Exact Test (FET). Univariate and multivariate logistic regression analysis were also performed. The predictor variables of the regression model were clinical ER status (positive vs. negative), HER2 status (positive vs. negative), tumor grade (level 2/3 vs. level 1), nodal status (positive vs. negative), and pathologic response to preoperative chemotherapy (pCR vs. RD). Survival by individual gene mutation status and by pathway mutation status was plotted using the Kaplan–Meier method. Survival curves were plotted separately for each disease subsets to avoid the confounding effect of mutation status associated with disease subtype.

Results

Distinct mutation patterns in different breast cancer subtypes

We detected mutations in 38 alleles in 15 genes in 108 (40 %) samples. Table 2 shows each of the detected mutations and their frequencies along with the potential therapeutic agents in clinical trials that could target these genes. Eighty-one of the 108 samples (30 %) had one mutation, 23 samples (9 %) had two, and four samples (1 %) had ≥ three mutations (Supplementary Fig. 1). The most frequently mutated or activated genes were PIK3CA (n = 43), PHLPP2 (n = 36), and FBXW7 (n = 21) (Table 2). Figure 1 shows a heat map of all gene-level nucleic acid variations annotated by routine clinical variables and samples that showed concurrent mutations.

Table 2 Observed mutations in breast cancer and corresponding drugs that inhibit functional signaling pathways
Fig. 1
figure 1

Heat-map of gene level mutations across all samples (n = 267). The x-axis represent tumor samples, the y-axis shows gene names, white indicates wild type and green mutated gene. The samples were ordered first by ER status, then by HER2 status, tumor grade (PreGrade), tumor node (PreNode), tumor size status (PreT), and pathological complete response (pCR) as shown on the rug plot on the top. Color scheme in the rug plot: Gray = missing values, for ER, HER2, PreNode and pCR status blue = negative and yellow = positive, for PreGrade and PreT levels yellow = 1, orange = 2, pink = 3, red = 4, where the numbers correspond to histologic grade and tumor size by TNM staging

When gene-level mutation frequencies were compared across the three clinical subtypes of breast cancer, PIK3CA mutations were significantly more common in ER+ cancers compared to TNBC (19 vs. 8 %, p = 0.001) (Table 3). PIK3CA were also commonly mutated in HER2+ cancers (28 %). In TNBC, FBXW7 mutations were significantly more frequent compared to ER+ cancers (13 vs. 5 %, p = 0.037) (Table 3).

Table 3 Individual gene and signalling pathway mutation frequencies in the three main different breast cancer subtypes

When gene-level mutations were collapsed into three mechanistic groups based on the known function of the genes, 32 % of cases had a mutation or activation in genes (AKT1, PHLPP2, PIK3CA, and PIK3R1) that could potentially lead to activation of the PI3K pathway, 6 % had activating mutations in at least one of the following RTKs (EGFR, KIT, RET, and PDGFR-α), and 15 % had a mutation in at least one gene involved in the regulation of cell cycle and metabolism (BRAF, CTNNB1, FBXW7, HIF1-α, IDH2, KRAS, and TNK2) (Fig. 2). Mutations in the PI3K pathway were significantly more common in ER+ and HER2+ cancers compared to TNBC (p = 0.032, Table 3). Multivariate logistic regression analysis confirmed significant positive associations between PI3K pathway mutations and ER (odds ratio [OR]:2.18, p = 0.01) and HER2 status (OR: 2.22, p = 0.02), respectively (Table 4). In multivariate analysis, mutations in RTKs had a positive trend to be associated with HER2 status (OR: 3.24, p = 0.09). Mutations in the cell cycle/metabolism genes showed significant inverse association with tumor size (OR: 0.57, p = 0.01) and a positive association with nodal status (OR: 2.61, p = 0.05).

Fig. 2
figure 2

Gene level mutations grouped into 3 functional categories. The percentages of individual gene level mutations that contribute to the three distinct functional categories are shown

Table 4 Association between clinical variables and mutation status at pathway level

Association between mutation status and clinical outcome

None of the individual genes alone had significant association with chemotherapy response or survival. However, at pathway level, mutation in the PI3K pathway was associated with higher probability of pCR to preoperative chemotherapy (OR: 2.04, p = 0.04), in multivariate analysis (Table 4). Multivariate analysis for subgroups by ER-status could not be performed because of the small cohort sizes in each subset. The logrank p-values of the comparisons between the K–M plots of the mutant versus wild type for each pathway and each mutated gene are shown in Supplementary Fig. 2–4. None of the survival curve comparisons were significantly different.

Validation of Sequenom findings with different methods

We performed independent technical validation of all mutated alleles for 15 genes. PIK3CA mutation status was available for a subset of this cohort from a previous study, and these results were also included in the validation [16]. Mutation results that did not reach a concordance of at least 95 % between different techniques were discarded and not reported on. The Sequenom methodology used in these experiments were subsequently transferred to a CLIA-certified laboratory environment and are now performed as routine clinical assays.

Discussion

Overall, our results indicate that breast cancers harbor several individually rare but potentially targetable mutations. Activating mutations in the PIK3CA gene were the most common but we also observed activating mutations of AKT1, BRAF, EGFR, KIT, KRAS, and PDGFR-α. The functional importance of these mutations in human breast cancer remains unknown; however, each of these genes can influence cancer growth in at least some experimental model systems and they each can be targeted with approved or investigational drugs (Table 2). The ultimate tests to determine whether any of these mutated genes represent therapeutic targets in breast cancer are a series of clinical trials. Our data demonstrated that it is feasible to determine the mutation status for this panel of genes in tissues from fine needle aspirations and several academic institutions have initiated projects to set up mutation profiling of solid tumors in routine pathology laboratories (http://www.mdanderson.org/education-and-research/resources-for-professionals/scientific-resources/core-facilities-and-services/molecular-diagnostics-lab/services/index.html). These developments provide a logistical framework to conduct such studies in the near future.

A potential limitation of this study is that we did not analyze normal samples matching the cancer specimens; however, these mutations all have been previously described in at least some cancer specimens as somatic events. We also included in our mutation panel functionally relevant SNPs such as PHLPP2 (PH Domain and Leucine Rich Repeat Protein Phosphatase 2). We observed a variant nucleotide in 13.5 % of cancers for this gene (c.3047T > C/p.L1016S). This variant is known to inactivate the phosphatase activity of PHLPP2, which dephosphorylates AKT-1,-2,-3 and protein kinase C and there is laboratory evidence to indicate that this leads to sustained AKT signaling in breast cancer cell lines [17]. It is therefore plausible that this germ-line variant may also influence the behavior of breast cancer in patients and may even represent a strategy to select patients for therapy with AKT pathway inhibitors.

Several other clinically testable therapeutic hypotheses can be formulated based on the mutations that we detected. We noted inactivating mutations in FBXW7 (F-box and WD Repeat Domain Containing-7) in 8 % of cases, and it was significantly more common in TNBC. This gene codes for a substrate-recognition factor that targets ubiquitin ligase to c-Myc, cyclin E, c-Jun, TGF-β, Notch, and mTOR for degradation [18]. Tumor cell lines harboring deletions or inactivating mutations in FBXW7 have high mTOR levels and are particularly sensitive to mTOR inhibitors [19].

In summary, this analysis shows that about 40 % of breast cancers contain mutations in common signaling pathways that can be targeted with currently clinically available drugs. Different breast cancer subtypes harbor different types of mutations, but any given mutation affects only a small subset of cases. Simultaneous testing of many different mutations in a single needle biopsy is feasible and allows the design of innovative prospective clinical trials that could test the functional importance of these mutations and the clinical value of mutation-based patient selection strategies for biologically targeted drugs.