Introduction

Invasive breast cancer is composed of several distinct histologic subtypes. Ductal carcinoma is most commonly diagnosed, representing 60–80% of tumors and lobular carcinoma represents up to 15% of cases [1,2,3]. Previous clinical research has shown that compared to ductal carcinoma, lobular carcinoma tends to be diagnosed in women at older ages, presents as larger, lower grade tumors, is more frequently diagnosed at higher stage, and is associated with a higher rate of bilateral disease [1, 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]. While studies have shown that lobular tumors are often Luminal A intrinsic subtype, it is unclear if the observed associations between histologic subtype and tumor characteristics are similar in direction and magnitude when restricted to Luminal A tumors. As genomic tests become more widely utilized in clinical settings, it will be important to understand the relationship between histology and molecular subtype and whether both impact clinical outcomes.

A recent analysis from The Cancer Genome Atlas project (TCGA) found lobular tumors to be predominantly Luminal A intrinsic subtype and distinct from ductal tumors, which show a greater diversity of intrinsic subtypes [33]. However, TCGA samples are not racially diverse and do not span early stage of disease. Therefore, we examined the relationship between histologic and molecular subtypes in a population-based sample with larger numbers of black women, PAM50 subtype, a validated TP53 gene signature, and histologic subtype among participants in the Carolina Breast Cancer Study (1993–2013) and placed these results in context of the same associations estimated in TCGA.

Methods

Study populations

The present analysis includes 4,359 cases of invasive breast cancer from the Carolina Breast Cancer Study (CBCS), Phases 1–3 (1993–2013). The CBCS is a population-based study among women from North Carolina [34], designed to oversample younger women (aged < 50 years at diagnosis) and black women [35]. Initiated in 1993, the CBCS recruited participants from 24 (Phase 1–2) and 44 (Phase 3) of the 100 North Carolina counties using rapid case ascertainment via the North Carolina Central Cancer Registry. After giving written informed consent, CBCS breast cancer cases were enrolled under an Institutional Review Board protocol approved at the University of North Carolina. CBCS eligibility criteria included being female, a first diagnosis of invasive or in situ (Phase 2 only) breast cancer, aged 20–74 years at diagnosis, and residence in specified counties. Only invasive cases were included in the current analysis. Participants provided consent to access tumor tissue blocks/slides and medical records from treatment centers.

TCGA study population has been described previously [33, 36]. A total of 808 tumors from women with newly diagnosed invasive breast cancer were used in this analysis. Cases were enrolled at numerous medical institutions and provided informed consent for access to tumor tissue specimens, which were used for histologic subtype classification, RNA extraction, RNA sequencing, and other molecular assays. Age at diagnosis, race, tumor size, lymph node status, and stage of disease were abstracted from the medical records. TCGA data is publicly available [33, 36].

Histologic subtype

Eligible CBCS cases had invasive tumor tissue available for centralized pathology review by the study pathologist. Single histologic subtype tumors were ≥ 80% representative of a single histology and mixed tumors contained ≥ 20% of a second histologic subtype in a tumor of another dominant (< 80%) histologic subtype. The following histologic subtypes were included in the main analysis: ductal (n = 2,856) and lobular (n = 326). In select analyses, we also considered: mixed ductal/lobular (n = 473) and an ‘other’ category (n = 704) including mixed ductal/non-lobular (n = 285), mucinous (n = 89), mixed ductal/metaplastic (n = 63), metaplastic (n = 44), DCIS w/focal invasion (n = 44), undifferentiated high grade (n = 29), tubular (n = 23), micropapillary (n = 21), papillary (n = 19), medullary (n = 18), pleomorphic lobular (n = 17), anaplastic (n = 14), apocrine (n = 11), cribriform (n = 9), neuroendocrine (n = 3), other (n = 15). Cases with unknown (n = 99) or missing (n = 376) histologic subtype were excluded.

In the TCGA, histologic subtype was available for all 808 women and was determined using clinical diagnostic criteria for histology applied by an expert pathologist committee [33]. A consensus ruling on histologic subtype was reached using the pathology reports and pathologist committee classification.

IHC-based 3-marker subtypes

For CBCS 1–2, estrogen receptor (ER) and progesterone receptor (PR) status was abstracted from medical records for approximately 80% of cases. The remaining cases with available tumor tissue had whole slide immunohistochemistry (IHC) staining performed at UNC on tumor tissue samples. The percent positivity for estrogen receptor (ER) and progesterone receptor (PR) was determined by a study pathologist using contemporaneous clinical definitions. Human epidermal growth factor receptor 2 (HER2) IHC staining was performed at UNC; positivity was defined as membrane/membrane plus cytoplasmic staining classified as weak or greater intensity in ≥ 10% of tumor cells [37, 38].

In CBCS3, 98% of cases had ER, PR, and HER2 data abstracted from the medical records, serving as the primary data source to determine IHC subtype for CBCS3. For the remaining 2% of cases without medical record data on ER, PR, and HER2, IHC staining was performed at UNC. For these 2% of cases, positivity cut points of ≥ 10% were used for ER and PR. HER2 positivity was defined as 3 + staining intensity [negative was defined as 0/1+ (equivocal cases with 2 + staining were excluded)] [39]. As Allott et al. [39] have described, for the 2% of CBCS3 cases without medical records ER, PR or HER2 data in this analysis, multiple tissue microarray (TMA) cores per case were stained for ER, PR, and HER2 and a core-to-case collapsing method was applied to classify the case as positive/negative for each marker [39]. Percent positivity and staining intensity was determined by automated algorithms (Aperio Technologies).

In TCGA, ER and PR status were provided from Tissue Source Sites using contemporaneous positivity cut points. HER2 data was available for a majority of cases, but where unavailable was supplemented by TCGA using HER2 copy number rather than FISH data [33, 36].

Across CBCS and TCGA, ER, PR, and HER2 status, was used to create 3-marker IHC-based subtypes defined as: Luminal A (ER+ or PR+ [any hormone receptor positive; HR+], HER2−), Luminal B (ER+ or PR+ [HR+], HER2+), Triple Negative (TN) (ER−, PR−, HER2−), and HER2+ (ER−, PR−, HER2+).

RNA-based intrinsic subtypes

For CBCS3, RNA counting (Nanostring) assays were carried out on a randomly sampled subset of available FFPE tumor tissue cores (n = 1,122) [39, 40]. RNA was isolated from 2, 1.0-mm cores from the same FFPE block using the Qiagen RNeasy FFPE kit (cat# 73504). Samples lacking sufficient quality data (n = 101) or cases with > 1 tumor block (n = 8) were excluded. RNA-based intrinsic subtype was determined using the PAM50 gene signature described by Parker et al. [40]. Based on the highest Pearson correlation with a subtype-defined centroid and each tumor was categorized into one of five intrinsic subtypes (Luminal A, Luminal B, HER2, Basal-like, Normal-like).

For TCGA, RNA was extracted from flash frozen tumor samples as previously described [33, 36]. Data accessed via https://tcgadata.nci.nih.gov/docs/publications/brca_2015/. PAM50 intrinsic subtype was determined using RNA gene expression data from microarrays or RNA sequencing data and categorized into one of the five intrinsic subtypes using an algorithm similar to that applied in CBCS3 [40, 41].

TP53 status

TP53 status was determined by immunohistochemistry (IHC) and RNA expression for CBCS3 cases as described elsewhere [42]. Briefly, IHC TP53 status was available for CBCS3 cases via IHC staining performed at the UNC Translational Pathology Laboratory. CBCS3 TMA construction with 1–4, 1 mm cores per participant, has been previously described [39]. A control TMA containing TP53 positive and negative breast tissue/cell lines was included in every staining run along with a negative control (no primary antibody).

Details of the TP53 scoring algorithm have been published [42, 43]. Briefly, TP53 staining was measured with the Aperio Nuclearv9 algorithm by quantifying tumor cellularity and was combined with the Genie Histology Pattern Recognition tool to classify the number of tumor and normal epithelial cells per core, allowing for tumor cell enrichment. To determine the average percent positivity, a method of core-to-case collapsing [39] and a percent-positivity weighting method [42, 43] were employed. Weighted percent TP53 positivity was negative/wild-type (< 10%) or positive/mutant (≥ 10%).

RNA-based TP53 status was determined for CBCS3 cases by detection of pathway deficiencies. In addition to the PAM50 genes, the Nanostring probe set contained 52 genes for a previously validated TP53 signature [41], which is independent of intrinsic subtype and classifies tumors as TP53 mutant-like or wild-type-like based on a similarity-to-centroid approach (Pearson coefficient) [41]. RNA-based TP53 status was available in TCGA. TP53 status (mutant-like or wild-type-like), as determined by the aforementioned TP53 gene signature, was determined using RNA gene expression data from microarrays in the same manner described above for CBCS3 [40, 41].

Statistical analysis

Generalized linear models were used to estimate relative frequency differences (RFDs) and 95% confidence intervals (95% CIs) as the measure of association between age, race, tumor characteristics, IHC subtype, intrinsic subtype, TP53 status and histologic subtype [44]. In CBCS1-3, unweighted sample size counts are presented alongside weighted percentages to account for the CBCS sampling design. The following variables were studied in association with histologic subtype in CBCS and TCGA: age (≤ 50, > 50), race [CBCS: self-reported black, non-black (> 98% white, 2% other (referred to as white); TCGA: black and white (other races were excluded in race-specific analyses due to low sample sizes)], combined tumor grade (CBCS1/3 only) (low-intermediate, high), AJCC stage of disease (I/II, III/IV), lymph node status (positive, negative), tumor size (≤ 2 cm, > 2 cm), ER (negative/positive), PR(negative/positive), HER2 (negative/positive), 3-marker IHC-based subtype, PAM50 intrinsic breast cancer subtype (excluding Normal-like subtype), and TP53 status (IHC: negative/wild-type, positive/mutant; RNA: mutant-like, wild-type-like). Sample percentages and generalized linear model analyses were performed in SAS v9.4 (SAS Institute, Cary, NC). Graphs were constructed using GraphPad Prism v7.02 for Windows (GraphPad Software, La Jolla, CA). p values were produced for a two-sided test with an alpha of 0.05 for statistical significance.

Results

Lobular characteristics in CBCS and TCGA

IHC-based and RNA-based intrinsic subtype distributions by histologic subtype, are presented in Fig. 1. Histologic distributions for CBCS3 and TCGA were similar, but TCGA had a slightly higher percentage of lobular tumors (CBCS; 9%; TCGA:15%) and a lower proportion of mixed tumors (CBCS: 16%; TCGA: 12%).

Fig. 1
figure 1

a Histologic subtype and IHC subtype distributions in CBCS1-3. b Histologic and PAM50 intrinsic subtype distributions in CBCS3. c Histologic subtype and IHC subtype distributions in TCGA. d Histologic and PAM50 intrinsic subtype distributions in TCGA

Compared to ductal tumors, lobular tumors displayed consistent differences in patient and tumor characteristics in CBCS and TCGA (Table 1). Associations between histology and age and race were weak, with black women and women ≤ 50 years of age less likely to be diagnosed with lobular disease in both studies. Compared to ductal, lobular tumors tended to be larger (CBCS Lobular relative frequency difference (RFD) (> 2 cm vs ≤ 2 cm): 14.0%, 95% CI [10.7, 17.4]), lower grade (CBCS Lobular RFD [high vs. low-intermediate]: − 43.4%, 95% CI [− 45.2, − 41.6]), and higher stage (CBCS Lobular RFD [III/IV vs. I/II]: 12.6, 95% CI [9.7, 15.4]). In CBCS and TCGA, lobular tumors were less frequently ER−, PR−, and HER2+. The magnitude and direction of RFDs for the associations between patient and tumor characteristics and histologic subtypes were similar in CBCS and TCGA (Table 1).

Table 1 Relative frequency differences (RFD) and 95% confidence intervals (95% CI) for the associations between age, race, and tumor characteristics for lobular compared to ductal tumors in the Carolina Breast Cancer Study Phases 1–3 and The Cancer Genome Atlas

Based on a TCGA report of fewer TP53 DNA mutations among lobular tumors relative to ductal tumors [33], we assessed the distribution of TP53 status by histologic subtype in CBCS (Table 2). We evaluated TP53 mutant status by IHC (CBCS3) and by RNA (CBCS3 and TCGA). Compared to ductal, lobular tumors were less likely to have TP53 mutant status (CBCS3: IHC mutant vs wild-type RFD: − 21.0%; 95% CI [− 24.4, − 17.6]; RNA mutant-like vs wild-type-like RFD: − 34.5% [− 39.5, − 29.4]) with a slightly larger magnitude association for lobular tumors in TCGA (RNA mutant-like vs wild-type-like RFD: − 41.8%; 95% CI [− 50.9, − 32.8]) (results not shown).

Table 2 Relative frequency differences (RFD) and 95% confidence intervals (95% CI) for the associations between patient and tumor characteristics for lobular compared to ductal tumors, overall and among Luminal A intrinsic subtype tumors in the Carolina Breast Cancer Study Phase 3

Molecular subtype by RNA and IHC

We observed few differences in molecular subtype distributions within histologic subtypes between CBCS and TCGA. Supplemental Fig. 1 displays the distributions of Luminal A and Triple Negative (TN)/Basal-like tumors by lobular and ductal histologic subtype and by study. Lobular tumors were predominantly Luminal A and proportions were similar by molecular subtyping method and study (CBCS IHC [Luminal A: HR+/HER2−]: 89%, RNA: 84%; TCGA IHC [Luminal A: HR+/HER2−]: 86%, RNA: 92%). Ductal tumors displayed more diversity in molecular subtype than lobular tumors. Proportions of ductal tumors with Luminal A subtype were similar between studies, but varied by technical method with lower percentages of ductal tumors classified as Luminal A by RNA than by IHC (Luminal A: HR+/HER2−) in both CBCS3 and TCGA (CBCS IHC: 58%, RNA: 39%; TCGA IHC 55%, RNA:42%). Ductal tumors from CBCS had higher proportions of TN/Basal-like subtype than TCGA by IHC and RNA (CBCS IHC: 26%, RNA: 27%; TCGA IHC: 20%, RNA: 23%).

We observed more pronounced differences in intrinsic subtype distribution according to technical method. The RFD and 95% CI for Luminal A compared to Non-Luminal A subtypes (IHC: Luminal B [HR+/HER2+], TN, and HER2+; RNA, PAM50: Luminal B, Basal-like, HER2-enriched) among lobular compared to ductal tumors are presented in Supplemental Fig. 2. Lobular tumors were more likely to be classified as Luminal A (IHC subtype: HR+/HER2−), with similar magnitude RFDs in CBCS1-3 and TCGA (CBCS: RFD [LumA vs. Non-LumA]: 30.9%, 95% CI: [28.6, 33.2]; TCGA: RFD 30.8%, 95% CI: [21.1, 40.4]). The same association held for Luminal A RNA-based subtype, but the magnitude of effect was stronger (CBCS: RFD [LumA vs. Non-LumA]: 44.9%, 95% CI [39.6, 50.1]); TCGA: RFD: 50.5%, 95% CI [43.9, 57.1]). Thus, 3-marker IHC subtype results attenuated the association between histology and Luminal A subtype.

Histologic associations among Luminal A tumors

Analyses were performed among Luminal A tumors to determine if the observed associations between histologic subtype and tumor characteristics, race, and age persisted after accounting for intrinsic subtype (Table 2). When restricted to PAM50 Luminal A subtype, differences persisted for race and age with black and younger women less likely to be diagnosed with lobular as compared to ductal disease, and these estimates were similar to those observed among tumors of all PAM50 subtypes. Similarly, in RNA analyses restricted to Luminal A subtype, lobular tumors remained larger in size, less likely to be high grade, and were diagnosed at higher stages of disease than ductal tumors. Associations for tumor characteristics and histologic subtype were similar in magnitude and direction by 3-marker IHC-based Luminal A subtype in CBCS1-3 (results not shown). Ductal and lobular tumors displayed similar differences in TP53 mutant status when determined by IHC and RNA.

Discussion

Characteristics of lobular tumors were quantitatively different from those of ductal tumors in CBCS and TCGA. In both studies, lobular tumors were significantly more likely to be Luminal A, both IHC- and RNA-defined, and have lower frequencies of TP53 pathway defects than ductal tumors. As previously reported, we found lobular disease to be more common among older and white women [4, 6, 10, 11, 45] more likely to be low-intermediate tumor grade, larger tumor size, and diagnosed at later stage of disease [1, 4, 6, 9,10,11, 14, 16, 17, 19, 22,23,24, 33, 37, 46,47,48,49,50]. However, we show that the associations between lobular subtype and patient/tumor characteristics, with the exception of TP53 status, persisted among Luminal A subtype only, suggesting that histology contributes to these observed associations even after restricting to a single, and the dominant, molecular subtype.

We observed associations that were quantitatively similar between two studies (CBCS and TCGA), but found that associations varied somewhat by technical method. In our study, the RFD for Luminal A subtype was 31% (comparing lobular to ductal) by IHC, but was 45% and 51% by RNA (CBCS3 and TCGA, respectively). In a sample of 75 lobular tumors from the I-SPY trial, Lips et al. (2012) determined PAM50 subtypes and reported an RFD of 44% for Luminal A subtype among lobular compared to ductal tumors, similar to our findings [50]. Other studies using IHC have reported RFDs for Luminal A subtype among lobular compared to ductal tumors ranging from 6 to 39%, reflecting tremendous molecular variability in the tumors studied [14, 16, 19, 22, 37, 47,48,49, 51, 52]. Proportions of Luminal A ductal tumors were also different between IHC subtype and RNA subtype data, suggesting RNA subtyping may be important for understanding associations between intrinsic subtype and histology.

In our study, we observed higher proportions of TP53 mutant-like tumors (RNA) than TP53 accumulation (IHC) in each histologic subtype. Compared to ductal tumors, lobular tumors were less frequently TP53 mutant by IHC, as previously reported [4, 10, 11], and less frequently TP53 mutant-like by RNA in CBCS3 and TCGA. Nearly 50% of CBCS3 ductal tumors were TP53 mutant-like by RNA, whereas only 13% of lobular tumors showed the same molecular profile. This difference was also pronounced in TCGA, where the RFD for TP53 mutation among lobular (versus ductal) was greater than 40% (results not shown). However, when we restricted our analyses to Luminal A subtype tumors, we observed very low frequencies of either ductal or lobular tumors that were TP53 mutant by IHC or mutant-like by RNA suggesting that TP53 status may be largely associated with molecular, rather than histologic subtype.

Our study should be interpreted in light of some limitations. We used centralized pathology review to classify tumors as a pure histologic subtype if < 80% of a dominant histologic subtype was present. Classification rules for histologic subtypes vary and a uniform definition has not been established. In TCGA, an expert panel sought consensus for lobular versus ductal carcinoma and all discrepancies were adjudicated. These differences in classification could explain the differences histologic subtype frequencies between the two studies [33]. However, overall direction of each association was very similar between the two studies for all patient and tumor characteristics examined. Due to low sample sizes of lobular tumors with non-Luminal A subtypes, we were unable to examine lobular associations with tumor and patient characteristics among molecular subtypes other than Luminal A. Likewise, we were unable to study molecular characteristics of the rare histologic subtypes diagnosed in less than 2% of cases in CBCS. Rare histologic and molecular subtype combinations will likely require large, consortia-based investigations.

To conclude, patterns of association between lobular tumors and patient and tumor characteristics were similar when restricted to Luminal A intrinsic subtype, suggesting histology reflects some robust biological differences. Future research may leverage the molecular differences between lobular and ductal tumors to improve classification and prognostication.