Introduction

Breast cancer is a heterogeneous disease with respect to prognosis and treatment response of patients. Clinical outcome of cancer patients is to a large extent driven by the biology of their tumors, rendering accurate classification of breast cancers of major importance. Expression of the estrogen receptor (ER), progesterone receptor (PR) and/or epidermal growth factor receptor 2 (ERBB2) has for long guided breast cancer classification. This classification has been refined by cytokeratin (CK) protein expression patterns. About 70% of primary breast cancers express at least one of the luminal CK proteins, whereas almost 30% of primary breast cancers express at least one of each of the luminal and basal CKs (CK7/8/18/19 and CK5/6/14, respectively; [1]). Two minor subtypes express only basal CKs or are negative for both luminal and basal markers, each representing <1% of tumors [1]. Luminal breast cancers are more often ER and/or PR positive or have overexpression of ERBB2. The so-called triple-negative phenotype of tumors that express neither ER, PR nor ERBB2 is observed primarily among basal breast cancers. Basal breast cancers also more often have EGFR expression [2]. Histological classification of breast cancers has clinical relevance in that patients with basal breast cancers more often have a worse prognosis than patients with luminal breast cancers [1, 2]. Perou and colleagues used a 4-protein signature that defined four groups of breast cancers: ERBB2-overexpressing (ERBB2+), luminal (ERBB2− and ER+), basal-like (ERBB2/ER− and CK5+ and/or EGFR+) and the negative group that lacks expression of all four proteins [3]. A major discrepancy among these and other definitions lies in breast cancers that express basal CKs as well as ER, which may be as much as one-third of all basal CK expressing breast cancers [4].

Classification of breast cancers by gene expression profiling has defined five subtypes of breast cancer based on the intrinsic gene set [5]. ER expression was a major classifier, including the luminal A and luminal B subtypes of ER-positive breast cancers. ER-negative breast cancers included the ERBB2+, basal-like and normal-like subtypes. The intrinsic subtypes of breast cancer have clinical relevance [57], as have gene expression signatures that had been defined based on clinical outcome of breast cancer patients [811]. Importantly, the success of gene expression profiling in breast cancer classification implies a relatively restricted set of recurrent expression patterns among breast cancer subtypes.

Cancer is a genetic disease. Mutations in a cancer gene often induce constitutive changes in the expression of its downstream pathway members or targets. The subtype-specific gene expression patterns among breast cancers were therefore likely to reflect, at least in part, the accumulation of mutations in subtype-specific cancer genes. Here, we have evaluated this concept by extensive molecular characterization of 41 human breast cancer cell lines at protein, transcript and gene level.

Materials and methods

Cell lines

The 41 human breast cancer cell lines used in this study are listed in Fig. 1 and described in detail elsewhere [12]. Melanoma cell line M14 was obtained from its provider at Micromet Inc. (Munich, Germany; [13]).

Fig. 1
figure 1

Molecular characterization of 41 human breast cancer cell lines. Left panel, classification of the cell lines by expression analyses. Classification by cytokeratin expression: L, luminal: CK8/18+ and/or CK19+, CK5−; C, combined luminal/basal: CK8/18+ and/or CK19+, CK5+; N, null: CK8/18 low, CK19−, CK5−; B, basal: CK8/18 low, CK19−, CK5+. Classification by the 4-protein signature: E, ERBB2 overexpression; L, luminal: ERBB2−, ER+; B, basal: ERBB2/ER−, CK5+ and/or EGFR+; N, negative: ERBB2/ER/CK5/EGFR−. Classification by intrinsic gene expression: E, ERBB2+; L, luminal; B, basal-like; N, normal-like; O, other subtype. For individual proteins: P, expression and PP, overexpression in blue and no expression in white. The two major types of breast cancer cell lines are indicated at the far left. Classification of the ERBB2 subtype varies with definitions, but is clinically defined by protein overexpression (score 3) or by indecisive overexpression (score 2) with concurrent gene amplification. Middle panel, gene mutation analysis of the cell lines. M, oncogenic mutations, D, sizeable deletions and A, amplifications in red; M*, heterozygous oncogenic mutations in tumor suppressor genes in pink; wild-type genes in white; H, promoter hypermethylation in yellow and Y, constitutive Wnt pathway activation in green. nd, not determined. The black borders indicate the two gene mutation profiles. Right panel, number of oncogenic mutations in each cell line and microsatellite instability (MSI) with BAT 25, 26 and 40: N, no; Y, yes; 40, MSI with BAT 40 only. Ki-67 immunohistochemistry: 1, <33% of cells positive; 2, 33–66% of cells positive; 3, more than 66% of cells positive

Expression analyses

Protein expression was determined by western blotting or immunohistochemistry on a tissue microarray of paraffin-embedded cell line samples (antibodies are specified in Supplementary Table S1). Transcript expression was determined by qPCR or by Affymetrix U133A microarrays. qPCR was performed as described [14] on an ABI7700 Taqman Analyzer (Applied Biosystems, Foster City, CA). Ct values were normalized according HPRT and PBDG housekeeper Ct values. DNAseI-treated RNA was antisense biotinylated using the MEGAScript T7 labeling kit (Ambion, Foster City, CA) and Affymetrix U133A microarrays were hybridized according Affymetrix GeneChip Manual, both performed on commercial basis by ServiceXS (Leiden, The Netherlands). Intensity values for all genes were scaled using the global normalization factor as specified by GCOS 1.1, and further normalized with Omniviz software 3.6 (Biowisdom, Maynard, MA). Intensity values <30 were set to 30. Differential gene expression was based on log2 transformed distances to the geometric mean for each probe set. Patterns of correlation were revealed by applying the Pearson matrix-ordering method that sorts samples into correlated blocks. The Stanford intrinsic gene set [15] was translated into an Affymetrix intrinsic gene set, including 451 probe sets from the Stanford list of 496 genes [7]. Intrinsic subtypes were classified by average distance linkage hierarchical clustering with non-centered correlation as distance metric [16]. Microarray data are available at GEO (http://www.ncbi.nlm.nih.gov/geo/), accession number GSE16795.

Gene mutation analyses

Gene sequence alterations typically were identified by direct sequencing of PCR-amplified genomic templates that included exon sequences and intron/exon boundaries. For RB1, exons 3 through 27 were analyzed by sequencing of RT-PCR-amplified templates of emetine-treated cell lines. The complete coding sequences have been analyzed for BRCA1 [17], CHEK2 [18], E-cadherin/CDH1 ([19]; Hollestelle et al., submitted for publication), MAP2K4 [20], MSH6, p14ARF, p16, p53 [21], PIK3CA [12], PTEN [12] and RB1, whereas only known mutational hotspots have been analyzed for AKT1 (codon 17), APC (mutation E1577X), BRAF (exons 7, 11 and 15; [12]), CDK4 (p16-binding domain in exon 2), FGFR3 (codons 248, 249, 372, 375, 393 and 652), and codons 12, 13 and 61 for HRAS, KRAS and NRAS [12]. All oncogenic mutations have been confirmed by sequencing of an independently amplified DNA and/or RNA template.

Amplifications of AKT1, AKT2, AKT3, CDK4, c-MYC, Cyclin D1, EGFR, ERBB2 and HDM2 were defined by genomic gain of the gene locus with concurrent transcript and/or protein overexpression. Genomic gains were determined by screening of publicly available CGH and SNP data ([22]; www.sanger.ac.uk), requiring gain of at least four allele copies. Genomic gain of the ERBB2 and EGFR loci was determined by chromogenic in situ hybridization (SPoT-Light® CISH™ kit, Zymed).

Promoter hypermethylation was determined for BRCA1, E-cadherin, p16, p14ARF and RB1, by azacytidine assays and/or methylation-specific PCR ([23]; Hollestelle et al., submitted for publication). Wnt pathway activation was determined by TOP/FOP reporter construct assays ([19]; Ng et al., submitted for publication).

Results

Expression analysis of breast cancer cell lines

We have characterized 41 human breast cancer cell lines by extensive expression analyses, using immunohistochemistry and microarray expression profiling (Fig. 1; Supplementary Table S2). CK protein expression patterns classified the breast cancer cell lines in four histological phenotypes: luminal (n = 22), combined luminal/basal (n = 4), null (n = 7) and basal (n = 1) [1]. Gene expression profiling identified two main clusters of cell lines (Fig. 2a). The major cluster contained all cell lines that (over)expressed ER, PR and/or ERBB2 proteins and that were luminal by CK expression (Fig. 1; Supplementary Table S2). The minor cluster contained triple-negative cell lines only, with a subdivision in combined luminal/basal and null phenotype cell lines by CK expression. Protein expression of EGFR and particularly of the EMT protein markers Vimentin and N-cadherin was mostly restricted to combined luminal/basal and null phenotype cell lines. A similar clustering of breast cancer cell lines was obtained by hierarchical clustering based on the intrinsic gene set (Fig. 2b; [7, 15]). Luminal phenotype cell lines by CK expression classified either as luminal or ERBB2+ by intrinsic gene expression. Similarly, combined luminal/basal and null phenotype cell lines were basal-like and normal-like, respectively, by intrinsic gene expression. The 4-protein signature of ERBB2, ER, CK5 and EGFR [3] was not able to distinguish combined luminal/basal or basal-like cell lines from null or normal-like cell lines. However, this signature identified all 13 cell lines with ERBB2 overexpression The results of these different classification methods were highly concordant and revealed a dichotomy among the breast cancer cell lines that we have designated “luminal-type” (n = 25) and “basal-type” (n = 15), irrespective of their further subclassification.

Fig. 2
figure 2

Global gene expression and intrinsic subtypes of breast cancer cell lines. a Pearson correlation plot based on the subset of 5,490 log2GM < −2 and >2 differentially expressed probe sets. The Pearson correlation coefficient algorithm positions samples according their overall similarity in gene expression, where red indicates high overall similarity and blue indicates low similarity. b Dendrogram of hierarchical clustering based on the intrinsic gene set. See Fig. 1 for color coding of intrinsic subtypes

Mutation analysis of breast cancer cell lines

In an ongoing effort to characterize the breast cancer cell lines, we have performed mutation analyses of 27 well-known cancer genes (results of 16 genes are first reported here; all mutations are detailed in Supplementary Table S3). Oncogenic mutations were defined as those mutations that predicted a premature termination or a significant deletion/insertion of the encoded proteins, splice site mutations and missense mutations that previously had been identified in clinical cancers or had been associated with hereditary cancer syndromes. For oncogenes that reportedly are amplified in clinical cancers, we considered chromosomal gains with concurrent transcript and/or protein overexpression also as oncogenic mutations or amplifications. Together, we have identified 146 oncogenic mutations among 24 cancer genes (Fig. 1). Oncogenic mutations in tumor suppressor genes included 30 sizeable deletions, 26 nonsense mutations and small deletions/insertions causing premature terminations, 7 splice site mutations and 29 missense mutations, totaling to 92 mutations. In concordance with the recessive nature of tumor suppressor gene mutations, 87 (95%) of these mutations were biallelic. Oncogenic mutations in oncogenes included 29 amplifications and 25 missense mutations, totaling to 54 mutations. Three tumor suppressor genes had transcriptional silencing by promoter hypermethylation, together in 20 cell lines. The 146 oncogenic mutations were identified among 40 of the 41 breast cancer cell lines, resulting in an average of 3.6 mutations per cell line and ranging from no to seven mutations each identified in a single cell line (Fig. 1).

For most cancer genes, we also determined transcript and protein expression patterns. As anticipated, transcript expression of mutant genes with premature terminations typically was low or absent, whereas missense mutations resulted in relatively normal transcript expression levels. Protein expression patterns mostly followed transcript expression patterns and/or were in concordance with those reported for clinical cancers. For example, p53 mutants with premature terminations typically had low transcript and protein expression, whereas p53 missense mutants had relatively normal transcript expression but high protein expression (Table 1). For some mutant genes, we observed expression modulation of other proteins in the same tumor suppressor pathway. In the RB pathway, for example, most RB1 mutants had high p16 protein expression and very low Cyclin D1 protein expression whereas Cyclin D1 mutants had low p16 protein expression (Table 2; Fig. 3). Similar observations had been made by Gauthier et al. [24], who have associated deregulation of RB signaling with basal-like breast cancers. Together, the transcript and protein expression patterns of mutant cancer genes thus further confirmed the functional significance of the identified oncogenic gene mutations.

Table 1 Mutation analysis of p53 pathway genes
Table 2 Mutation analysis of RB pathway genes
Fig. 3
figure 3

Protein expression of RB pathway genes by western blotting

Gene mutation profiles among breast cancer cell lines associate with breast cancer subtypes

Compilation of the mutation data revealed two prevalent gene mutation patterns among the breast cancer cell lines (Fig. 1). The first pattern involved frequent mutations among the cell lines in genes from the same tumor suppressor pathway. These included the p53 pathway in 90% of the cell lines (p53, CHEK2, HDM2, c-MYC and p14ARF; Table 1), the RB pathway in 64% (RB1, p16, Cyclin D1 and CDK4; Table 2) and the PI3K pathway in 56% (PTEN, PIK3CA, AKT1, AKT2 and AKT3; [12]). Oncogenic mutations within the RB pathway were essentially mutually exclusive among cell lines. Similar to other reports [2527], PIK3CA mutations were biallelic in two cell lines and monoallelic concurrent with AKT1 amplification or monoallelic PTEN mutation in two other cell lines [12]. Somewhat surprising was the frequent concurrent mutation of p53 or CHEK2 mutations with mutations in p14ARF, C-MYC or HDM2. It may be of interest that all p53/p14ARF and p53/c-MYC double mutants had p53 missense mutations, suggesting some residual activity of these p53 mutants (Table 1; Supplementary Table S3). Whatever the functional significance of these double mutations [28], the frequent mutational abrogation of the p53, RB and PI3K pathways among the cell lines suggests that their functionality is of major importance in breast carcinogenesis.

Most exciting, the second gene mutation pattern involved oncogenic mutations that cluster in a particular subset of breast cancer cell lines (Fig. 1). These included a “luminal mutation profile” involving E-cadherin and MAP2K4 gene mutations and amplifications of Cyclin D1, ERBB2 and HDM2, and a “basal mutation profile” involving BRCA1, RB1, RAS and BRAF gene mutations and deletions of p16 and p14ARF. Specifically, the cohort of 25 luminal-type breast cancer cell lines included 35 of 36 mutations in luminal mutation profile genes (Fisher’s exact P < 0.0001). The cohort of 15 basal-type breast cancer cell lines included 33 of 39 mutations in basal mutation profile genes (P < 0.0001). The existence of specific gene mutation profiles for each of the two major subtypes of breast cancer cell lines suggests a dichotomy in the genetic basis of human breast cancer.

Discussion

Expression analyses suggest two major types of luminal and basal breast cancer cell lines

Classification of the breast cancer cell lines was remarkably consistent, whether they were classified by classical breast cancer parameters (ER, PR and ERBB2), histological criteria (CK protein expression or the 4-protein signature), or gene expression profiling (intrinsic subtypes; Fig. 1). Classification of the ERBB2+ subtype is ambiguous in breast cancer cell lines as well as clinical breast cancers due to differences between ERBB2 protein overexpression, gene amplification and expression of genes from the intrinsic gene set. Our mutation analyses strongly suggested that the basal-like or combined luminal/basal and normal-like or null subtypes of cell lines represent two ends of a spectrum of basal-type breast cancer (Fig. 1). This was supported by the protein expression profiling data, specifically expression of E-cadherin, P-cadherin, N-cadherin, Vimentin and EGFR. Most notable, a similar interrelationship has been proposed for clinical breast cancers [29] and accordingly, Neve et al. [22] had designated these two subtypes of breast cancer cell lines as basal A and basal B. Our molecular characterization of breast cancer cell lines is concordant with the concept that two major types of luminal and basal breast cancers exist that may be subdivided further by protein expression and/or intrinsic gene expression profiling.

Accumulation of many mutations in well-known cancer genes in breast cancer cell lines

We have identified 146 oncogenic mutations among 41 breast cancer cell lines. This mutation frequency was much higher than anticipated. For example, the mutation frequencies of p53 and E-cadherin were about twice as high as reported for clinical breast cancers [30, 31]. This high mutation frequency likely reflects a higher rate of mutation detection in cell lines. Importantly, the presence of non-neoplastic cells in clinical breast cancer samples essentially precludes identification of homozygous deletions, whereas these represent 20% of the mutations that we have identified. The Sanger Institute also has screened known cancer genes in breast cancer cell lines, with 15 genes and 23 cell lines overlapping with our study (www.sanger.ac.uk as per February 2009). Sanger failed to identify 23 (35%) of 65 mutations that we had detected and we failed to identify one mutation (that we subsequently confirmed). The reason for this discrepancy is unclear, but we expect that our mutation screens may involve more detailed primer design, PCR optimization and data analysis. Sample swapping appears unlikely since seven cell lines with discrepant results also carried signature mutations identified by both Sanger and us (Supplementary Table S4). Our results for the other two cell lines have been confirmed by literature reports [25, 3235]. Overall, 14 of 23 discrepant mutations had been confirmed in literature reports, also rendering incidental sample swapping unlikely [25, 3238]. We consider it unlikely that a significant proportion of mutations has been acquired during in vitro propagation of the cell lines, although there is no hard evidence to support this notion nor, for that matter, to reject it. Most important, the majority of the identified gene alterations appeared to be of functional relevance as they had previously been reported in clinical cancer samples, in the germline of patients with a hereditary cancer syndrome, or were sizeable deletions or intragenic mutations that predicted a premature protein termination (and thus are less likely to be recurrent). Silent mutations or missense mutations of unclear significance represented <10% of all identified deviations from the reference coding gene sequences (5 and 7 mutations, respectively). A more likely explanation for the high mutation frequency among cell lines is a bias for breast cancers that are more easily propagated in vitro, such as p53 mutant breast cancers and likely also those of the null or normal-like subtype—both being cancers that are known to have a more aggressive clinical course. This might, however, not necessarily be disadvantageous as it allows one to study the more clinically needy breast cancers.

Gene mutation profiles provide a genetic basis for luminal-type and basal-type breast cancers

We have identified subtype-specific gene mutation profiles among the breast cancer cell lines. Of the mutations in luminal mutation profile genes, 97% had been identified among the luminal cell lines. The DU4475 cell line was the single non-luminal cell line that had a mutation in a luminal mutation profile gene: MAP2K4. Basal-type breast cancer cell lines carried 85% of the mutations in basal mutation profile genes. DU4475 again was atypical with mutations in RB1 and BRAF. The MDA-MB-435s cell line was the only basal-type cell line that did not express EGFR proteins. MDA-MB-435s was recently shown to be genetically identical to the M14 melanoma cell line from the NCI-60 cell line panel (several MDA-MB-435s vials vs. a single M14 vial; [39]). We have obtained M14 cells from several laboratories, including a vial of passage-10 cells from its originator, and MDA-MB-435s was obtained from ATCC at passage-239. Indeed, microsatellite analysis confirmed that the two cell lines were identical. It thus remains unclear which is the correct origin of the cell lines, although recent evidence suggests that they may be of breast origin [40]. Be as it may, it is not likely that the other basal-type breast cancer cell lines are melanocytic. Four of the basal-type cell lines carry BRCA1 mutations and one carries the CHEK2 1100delC mutation, both well-known breast cancer susceptibility genes [41]. In fact, the consistent histological and intrinsic breast cancer classification of the basal-type cell lines strongly suggests that a melanoma-like gene mutation profile (that includes mutations in p16, RB1 and RAS pathway genes [42]) is characteristic for basal-type breast cancers. Comparable gene mutation profiles among basal-type breast cancers and melanomas, and likely also pancreatic cancers [43], might be conceivable given the typically aggressive clinical course of these tumor types and their frequent inherent therapy resistance. In this respect, distinct gene mutation profiles among breast cancers could very well explain the significant heterogeneity in clinical outcome of breast cancer patients.

It is of interest that recent large-scale sequencing efforts have identified a similar number of somatic oncogenic mutations among breast cancers and colorectal cancers, but that the spectrum of mutated pathways was far more diverse among breast cancers (an average of 14 and 15 mutations per tumor among 108 and 38 pathways for breast cancer and colorectal cancer, respectively; [44, 45]). These differences in mutation spectrum suggested genetic heterogeneity among breast cancers, and are thus in line with our finding of two distinct gene mutations profiles. After all, the existence of two gene mutation profiles among breast cancers implies more mutant genes when considering breast cancer at large. Similarly, mutations in subtype-specific cancer genes will then be less prevalent. Our study has focused on mutations in well-known cancer genes and the identified mutations, therefore, are more likely to represent drivers of breast carcinogenesis. Although this approach does not draw the full genomic landscape of breast cancer, it did allow us to decipher specific patterns of gene mutations among the two major breast cancer types. Of course, breast cancer cell lines are not clinical breast cancers. As a proof of principle, we have shown that E-cadherin mutation and E-cadherin promoter methylation indeed also associate with distinct pathological types of clinical breast cancers (Hollestelle et al., submitted for publication). It also is significant that mutations in three other genes follow patterns reported for clinical breast cancers, including the exclusive presence of Cyclin D1 amplifications in ER-positive breast cancers, RB1 mutations in ER-negative breast cancers and the concurrent mutation of BRCA1 and p53 in ER-negative breast cancers [24, 4649]. If the evidence is to be extended to other genes from the mutation profiles, we may at last have begun to elucidate the genetic basis for the inherent biological and clinical heterogeneity among human breast cancers.