Introduction

In recent decades, mammography has become a widely implemented screening tool for breast cancer (BC) among women aged 40 and older. Meta-analyses of randomized clinical trials indicate that routine mammographic screening reduces BC mortality by 30 % among women 50–69 years of age [14], although the magnitude of this mortality benefit remains controversial [5]. Under current practice, the effectiveness of population-based screening mammography is compromised by at least two important factors: the absence of rationally targeted screening, and mammography’s limited sensitivity and specificity [68]. Deployment of this imperfect screening tool based almost exclusively on age and sex results in millions of healthy women being frequently imaged, many women without cancer needlessly sent for breast biopsies, and other women with cancer falsely declared cancer free. Novel clinical methods are needed to enhance the selection of women who initially undergo screening, and to improve the accuracy of imaging among those who do [911]. Blood-based biomarkers with high sensitivity and specificity could be used in conjunction with radiographic approaches to increase early detection of BC while reducing false positives and unnecessary clinical follow-up [12].

Many past studies have sought to identify circulating biomarkers for BC, and multiple components of blood have been mined for diagnostic potential—autoantibodies, proteins, miRNAs, cell-free DNA, and circulating tumor cells [1320]. Despite advances in molecular classification of BC sub-types and in our understanding of disease pathogenesis, no clinically useful blood-based biomarkers for early diagnosis have yet been identified. Recent studies based on mass spectrometry analysis of pre-clinical blood specimens have suggested that plasma proteome alterations may be detectable prior to BC diagnosis [19].

Breast cancer represents a heterogeneous disease, with multiple sub-types defined by distinct histological and molecular characteristics, epidemiologic risk profiles, and clinical features [2124]. Infiltrating ductal carcinoma (70–80 % of invasive lesions) and infiltrating lobular carcinoma (5–10 %) comprise the two most common histological types of invasive BC. In contrast to ductal BC, the rarer lobular form is associated with older age, larger and better differentiated dispersed tumors, ER positivity, absence of E-cadherin, and increased early (≤10 years) survival [25]. Defined by joint ER/PR status and histology, the most common sub-type in the U.S. is ER+/PR+ invasive ductal carcinoma (IDC). While 5-year survival rates for ER+/PR+ IDC are higher than those observed for the less-common but more lethal triple-negative (ER−/PR−/HER2−) BC, ER+/PR+ cancer remains a significant cause of morbidity and mortality among women overall, particularly when diagnosed at late stages [26]. Given the molecular heterogeneity of BC, it seems plausible that plasma proteomic profiles associated with disease onset, if they exist, may exhibit sub-type specific patterns [12]. We employed a custom-designed antibody-array platform [27] to interrogate pre-diagnostic plasma from 121 BC cases (all ER+/PR+ IDC), and 121 matched controls enrolled in the Women’s Health Initiative (WHI) observational study, with the aim of identifying protein biomarkers that are differentially abundant in the plasma of cases relative to controls.

Materials and methods

Study design

We conducted a nested case–control study of ER+/PR+ IDC within the WHI observational study, a prospective cohort of 93,676 post-menopausal women enrolled from 1993–1998 in the U.S. [28, 29]. Our study included 121 ER+/PR+ ductal BC cases and 121 controls without a prior history of any type of cancer, individually matched 1:1 to cases on age at enrollment (±3 years), race/ethnicity (white, non-white), body mass index (BMI, ≤25, 25–29, or 30+), hormone replacement therapy use (estrogen only, estrogen plus progestin, or non-user), and year of blood draw. Cases with an available study blood specimen drawn within 12.5 months prior to their BC diagnosis were eligible for inclusion. ER, PR, and HER2 status were extracted from medical records and centrally adjudicated by WHI staff. The 121 matched pairs were randomly assigned to a training set (n = 60 pairs) and a testing set (n = 61 pairs). The use of human samples was approved by the Institutional Review Board of the Fred Hutchinson Cancer Research Center.

Laboratory methods

Plasma samples were interrogated using a customized antibody array populated with 3290 full-length antibodies to 2036 distinct targets, encompassing several classes of proteins involved in diverse signaling pathways. Of the 3290 antibodies, 292 were included as standards targeting CA19-9 (n = 96), CEACAM1 (n = 48), CSF3 (n = 48), CXCL12 (n = 48), or IL1β (n = 48), while four antibodies on the array served as technical controls targeting Cy3 (n = 2) or Cy3/Cy5 (n = 2). These standards and controls included replicates of two independent antibodies for CA19-9 and replicates of a single antibody for the other targets. Protocols for array fabrication, sample treatment and labeling, incubation of plasma with arrays, and array scanning have been previously described [3032]. Briefly, for this study, 200 μg case and control plasma protein were labeled with Cy5 and separately incubated for 90 min with Cy3-labeled 200 μg reference plasma protein (a common pool of plasma composed of samples collected from seven women aged 45–72 years was used as a reference for all samples) in 80 μl total volume (kept from drying using LifterSlips, Fisher Scientific, Pittsburgh, PA). Samples from a given matched pair (case and control) were incubated on the same day. After washing, slides were scanned in a GenePix 4000B microarray scanner and data extracted using GenePix Pro 6.0 software (Molecular Devices, Sunnyvale, CA, USA).

Statistical analyses

Genepix results that raw data files were initially processed using the R package limma [33, 34]. For each antibody, fold change of signal (red channel) compared to reference (green channel), the M value, was calculated as log2(R c/G c), where R c is red corrected, and G c is green corrected (using the normexp background correction method). After exclusion of technical control spots (n = 4 antibodies) and all spots with GenePix flags ≤−50, antibody features on each array were filtered based on the coefficient of variation (CV) for observed M values across triplicate spots. Technical sources of variation were normalized by loess procedures developed for microarrays, including within-array print-tip loess and between-arrays reference channel quartile normalization. Following normalization, triplicate spots were summarized using their median. Antibody features with missing M values in > 30 % of the 121 matched pairs (after CV filtering) were excluded from the analysis (n = 819 of 3286). All statistical analyses were performed using M values. Intra-array reproducibility was evaluated by calculating the CV across triplicate spots of each antibody feature (Figure S1). Inter-array reproducibility was examined by calculating the Pearson pair-wise correlation coefficient (ρ) between arrays probed with an identical plasma sample (Figure S2).

Paired t tests were conducted for each antibody feature using the 121 BC cases and 121 matched controls. Antibodies for which the mean difference in M values between paired cases and controls differed significantly from zero (at P < 0.05) in both the training and testing sets were selected as candidate markers. After exclusion of the four control antibodies, Bonferroni correction for multiple comparisons was conducted (α = 0.05/2467 = 2.03 × 10−5) using the pooled dataset (training and testing), and Benjamini–Hochberg false discovery rate q values were generated. Performance metrics (AUC/sensitivity/specificity) were calculated using covariate-adjusted marker values.

Multivariate classification models were constructed using regularized logistic regression with elastic net penalty (R package: glmnet) [35]. After exclusion of n = 7 subjects due to missing covariate values, 235 subjects were available for analysis. Missing M values for the 2467 antibody features included in the main analysis were imputed via the k-nearest neighbor method (R package: knn). Equal numbers of cases and controls were randomly allocated to a training set (75 %: n = 175), used for variable selection and model selection, or a testing set (25 %: n = 60), used for (preliminary) model validation. With the mixing parameter (α) set to 0.5, fivefold cross-validation was conducted within the training set only to select the optimal value of the penalty parameter (λ). A model was generated using the complete training set data and used to predict class values for subjects in the testing set. Monte Carlo cross-validation (MCCV) was conducted using 100 different training (and associated testing) sets randomly selected from the total study sample. Mean estimated area under the receiver-operating characteristic (ROC) curve (AUC) was calculated across the 100 testing sets. A composite average ROC curve was generated (R package: ROCR) [36].

Gene set enrichment analyses (GSEA) were based on the Kyoto encyclopedia of genes and genomes (KEGG) and gene ontology (GO) gene sets, available from the Molecular signatures database (MSigDB). After exclusion of antibodies specific for carbohydrate or non-human protein targets, 2396 of 2467 antibodies included in the main analysis were available for GSEA (corresponding to proteins encoded by 1693 unique genes). Of the 186 KEGG gene sets obtained from MSigDB, 128 sets contained at least five genes coding for proteins targeted by our arrays (Table S1A). Of the 1454 available GO gene sets, 822 sets contained at least five genes coding for proteins targeted by our arrays (Table S1B). Wilcoxon rank-sum tests were conducted on the test statistics obtained from paired t tests (comparing M values of matched cases and controls). A null distribution for the Wilcoxon test was generated from analysis of 1000 permuted datasets in which the case/control status of subjects in each matched pair was randomly maintained or reversed. The (one-sided) P value for each pathway was calculated as the fraction of all permuted datasets in which the sum of the ranks of the paired t test statistics for markers in the gene set was more extreme than the observed sum of ranks for these markers’ test statistics in the real dataset. All statistical analyses were conducted using Stata v13.1 (College Station, TX) or R v3.03.

Using data from The Cancer Genome Atlas (TCGA), we identified post-menopausal female subjects with Stage I/II ER+/PR+ infiltrating ductal breast carcinoma. Those with available tumor RNA expression data were selected (n = 130). Paired normal tissue had been profiled for n = 10 of these women. RNA expression data was also included from an additional 51 normal tissue specimens isolated from BC cases not satisfying our inclusion criteria. Technical details pertaining to RNA profiling have been published previously [37, 38]. Microarray data were loess normalized, and the ratio of the Cy5 channel (sample) and Cy3 channel (reference) was log2-transformed to generate gene expression values (E = log2 Cy5/Cy3). Welch’s t tests were conducted to compare mean RNA expression levels in the n = 130 tumor samples and n = 61 normal samples. Marker fold changes were calculated as the ratio of geometric mean signals in tumor samples versus normals: 2^μ E,tumors/2^μ E,normals.

Results

Cases and controls in our training and testing sets were well-balanced with respect to matching covariates: age, race/ethnicity, BMI, and HRT use (Table 1). None of the observed differences in the distributions of these variables between cases and controls, in either the training or testing set, reached statistical significance when assessed by χ 2 tests (data not shown). The antibody-array profiling platform exhibited high levels of both intra-array and inter-array reproducibility (Figures S1/S2), consistent with past performance [30, 39]. When comparing cases to their matched controls, statistically significant (P < 0.05) mean differences in signal intensity were observed for 328 of 2467 antibodies (13.3 %) in the training set. Of these 328 candidates, 39 were validated (P < 0.05) in the testing set (Table S2). In the pooled analysis (training and testing sets combined), geometric mean within-pair fold changes of (R c/G c) for the 39 identified candidates, comparing cases to matched controls, ranged in magnitude from 0.76–1.11 (Table 2). Four markers remained significant after Bonferroni correction for multiple comparisons (P < 0.05/2467 = 2.03 × 10−5): Colony stimulating factor 2 (CSF2), transferrin receptor (TFRC), RING1 and YY1 binding protein (RYBP), and integrin β4 (ITGB4) (Figure S3). Based on Western immunoblots, the printed array antibodies for these top candidates recognized antigens of the expected molecular weight and observed intensity changes across different plasma samples appeared consistent with array-based M values (Figure S4).

Table 1 Subject characteristics of 121 matched case–control pairs randomly assigned to either a training set or testing set
Table 2 Top-ranked antibodies by paired t test, comparing cases versus matched controls (P < 0.05 in both training and testing sets)

Nineteen additional candidates were captured when the false discovery rate was set to q < 0.05. TFRC and BRCA2 were each identified as nominal hits by two different antibodies. AUCs based on covariate-adjusted marker values ranged from 0.58 (CTGF) to 0.71 (CSF2), with maximum sensitivity of 33 % at 95 % specificity (RYBP), and maximum specificity of 30 % at 95 % sensitivity (TFRC). Ten of the 39 candidates had a sensitivity of ≥20 % at 95 % specificity, and three had a specificity of ≥20 % at 95 % sensitivity. Among the top-performing markers, some exhibited superior performance at high fixed levels of sensitivity, relative to high fixed levels of specificity, while others exhibited skewed performance in the opposite direction, as illustrated by the asymmetrically shaped ROC curves for TFRC and RYBP, respectively (Fig. 1). Exploratory analyses suggested that several markers identified may exhibit stronger associations (lower P values and similar or larger fold changes) closer to the time of BC diagnosis: TFRC, ITGB4, SNX5, CD2, NTRK1, CSF3R, BRCA1, and RASGRF2, among others (Figure S5). Conclusive interpretation was limited by reduced study power.

Fig. 1
figure 1

Receiver-operating characteristic (ROC) curves for TFRC (a) and RYBP (b), based on covariate-adjusted marker values in the pooled sample set

We next determined whether data from TCGA might provide further support for any of the ~40 candidate markers identified. Since protein expression data were not available, we evaluated RNA profiles for the corresponding genes in 130 Stage I/II ER+/PR+/ductal breast tumors and 61 normal tissue samples (Table S3). 25 of the 37 transcripts assessed exhibited statistically significant differences in mean abundance when comparing tumor and normal tissue samples, including three of our top five ranked candidates (TFRC, RYBP, ITGB4). 14 of 25 showed changes directionally concordant with the plasma protein alterations identified in our BC cases versus controls.

To assess the utility of combining multiple candidates into a composite marker panel, we conducted multivariate modeling using regularized logistic regression with elastic net penalty [40]. The mean AUC of models built using the 39 candidates listed in Table 2 and assessed across 100 rounds of MCCV, was 0.75 (SD = 0.06), with estimated sensitivity of ~30 % at 95 % specificity (Fig. 2a). Parsimonious models built with the top four markers reaching Bonferroni significance (CSF2, TFRC, RYBP, ITGB4) achieved comparable classification accuracy (Fig. 2b).

Fig. 2
figure 2

Receiver-operating characteristic (ROC) curves for multivariate classifiers based on a all 39 candidate protein markers identified (Table 2), or b the top four candidate protein markers reaching Bonferroni significance (CSF2, TFRC, RYBP, ITGB4). Study participants were randomly divided into a training set (75 %: n = 175) or testing set (25 %: n = 60). A multivariate classifier was derived in the training set using the indicated markers and evaluated in the testing set. After 100 rounds of Monte Carlo cross-validation (MCCV), a composite average ROC curve was generated (solid black) from the individual ROC curves (dotted lines) derived in each testing set. Mean area under the ROC curve (AUC) was calculated across all MCCV iterations. Estimated sensitivity at 95 % specificity indicated by black circles

Gene set enrichment analyses were conducted to assess whether plasma levels of proteins encoded by genes in defined KEGG/GO gene sets were significantly, concordantly altered in cases relative to matched controls. Twelve of the GO gene sets (and none of the KEGG sets) reached significance in both the training and testing sets. (Table 3 and Figure S6). Two gene sets, microtubule cytoskeleton and response to hormone stimulus, remained significant in the pooled analysis when allowing for a false discovery rate of q < 0.25. Both appeared to be upregulated in cases versus controls.

Table 3 Top gene sets identified as differentially abundant in cases versus matched controls, using training, testing, or pooled sample sets

Discussion

This biomarker discovery study coupled the use of pre-clinical plasma specimens with a powerful antibody-array platform to identify candidate blood-based biomarkers for ER+/PR+ invasive ductal BC. Our arrays included >3000 antibodies directed against >2000 proteins, which encompass many cytokines, adipokines, growth factors, and other secreted or membrane proteins implicated in a range of biological pathways dysregulated in BC. The putative markers identified (37 distinct proteins) include both previously implicated and novel proteins in BC pathogenesis. Half of these candidates are located either in the plasma membrane (n = 12) or extracellular space (n = 8), with the remainder divided between the nucleus (n = 10) or cytoplasm (n = 7) (Ingenuity Pathway Analysis). The two largest functional groups are membrane receptors (n = 9) and transcription regulators (n = 7), with kinases, peptidases, transporters, growth factors, and cytokines accounting for most of the remaining share.

Among four candidates meeting Bonferroni significance, two (TFRC and ITGB4) have been the focus of multiple BC-related studies [4146]. TFRC (transferrin receptor), which functions in cellular uptake of iron and is expressed on highly proliferating cells, was identified by two independent antibodies on our array. Recent studies have linked elevated TFRC protein expression in ER+ breast tumors to higher clinical grade, increased proliferative activity, and worse prognosis [41, 42]. Membrane-bound TFRC is proteolytically cleaved to generate a soluble form of the receptor, which circulates in the plasma [47]. Interestingly, we observed lower levels of TFRC protein in the pre-clinical plasma of BC cases relative to controls. It remains uncertain, however, whether reduced levels of circulating TFRC reflect decreased abundance of cellular TFRC, particularly in breast tissue. In contrast to TFRC, ITGB4 (β4 integrin) was elevated in the plasma of cases relative to controls. ITGB4 belongs to the integrin family of transmembrane receptors, which function in cell adhesion, migration, and proliferation [48]. β4 integrin has been linked to increased breast tumor size, nuclear grade, and BC progression, and is believed to function in part through activation of the MAPK and PI3K/Akt signaling pathways [4446, 49]. The 37 candidates for ER+/PR+ ductal BC show minimal if any overlap with those identified in our studies of triple-negative BC. While our published report on TN BC [27] used an earlier-generation array with only limited antibody overlap with the current arrays, more recent side-by-side discovery studies for ER+/PR+ ductal and TN BC on the same platform point to distinct patterns of proteomic alterations in the pre-clinical plasma of women who develop these types of cancer (C. Li, unpublished observations). Such differences are compatible with our growing recognition of BC sub-types as disparate diseases on the epidemiologic, clinical, and molecular levels, but larger studies are required for confirmation.

The mechanisms by which pre-clinical BC may lead to changes in the circulating plasma remain largely unknown. Detected alterations, if not due to chance, could represent (i) biological signals originating in the evolving target tissue (breast) or (ii) systemic changes or responses associated with a nascent cancer. In the former scenario, dysregulated neoplastic physiology might yield a modified repertoire of secreted proteins [50], while elevated levels of necrosis/apoptosis could produce increased spillage of cellular components. Alternatively, enhanced systemic inflammation or altered immune profiles [51] could underlie observed changes in the plasma/serum. Future studies using inducible in vivo models of carcinogenesis and paired analysis of blood and tissue may provide further mechanistic insights [52].

Our TCGA analyses indicated that 14 of our top 37 serum protein candidates exhibited significantly altered and concordant changes in RNA expression, when comparing breast tumor versus normal tissue. It remains unknown why only a subset of the markers showed parallel changes in tissue RNA expression, and some showed altered expression in the opposite direction. However, we note that RNA and protein expression levels of a given gene within a particular tissue may show only weak to moderate correlation [53], likely reflecting complex post-transcriptional and post-translational regulatory mechanisms. When considering serum protein and tissue RNA levels, the rate of protein secretion or spillage into the bloodstream represents yet another potential layer of regulation.

If the identified marker candidates are externally validated, an important objective will be to explore their potential clinical utility. Under current practice, age and sex remain the sole factors used on a widespread basis in determining whether or not a screening mammogram is recommended. More refined methods of BC risk assessment that integrate epidemiologic factors with clinical parameters (e.g., plasma protein measurements) could tailor screening recommendations. Three scenarios for using blood-based biomarkers to guide mammography usage relate to identifying subgroups of higher-risk women who are suitable for additional imaging: women aged 50–69 who should be recalled early after a negative mammogram; healthy women aged 75–79 who may benefit from continued mammography; and women aged 50–69 who should be prioritized for imaging in low-resource settings (e.g., rural areas). In these scenarios, marker panels achieving even modest sensitivity (e.g., ~30 %) at high fixed levels of specificity (95 %), such as those derived in this study, would have the potential to capture a significant number of additional women who are most likely to benefit from mammography, without flooding the system with those least likely to need imaging. One important caveat with respect to our modeling results is that performance estimates were derived using the same set of samples used to identify the included biomarker predictor variables. Given the resulting likelihood of some level of optimism bias, studies in external study populations are essential.

Our antibody-array platform has certain limitations. First, it interrogates only a subset of the complete plasma proteome, dictated by the set of included antibodies. Our pathway analyses were somewhat limited by the absence of data on many proteins belonging to KEGG/GO gene sets. Second, relative to gold standard assays (sandwich ELISA), the antibody array does not incorporate enzymatic amplification or require multiple antibodies to bind a given target at distinct epitopes. Thus, technical sensitivity and specificity are unlikely to match the parameters of targeted assays. Nevertheless, the array platform has shown excellent concordance with ELISA for specific protein targets [27, 30, 31]. Third, some antibodies failed on a sizable number of study subjects, due to low intensity or highly variable signals. The intended use of our array platform is for discovery purposes, however, and any potential downstream clinical assays would likely evaluate a small number markers using ELISA.

Additional studies are needed to replicate our array-based proteomic profiling results on independent sample sets. Confirmed candidates would then be advanced as potential biomarkers for further validation using gold standard assays capable of absolute quantification, when feasible. The potential clinical utility of these results will require further investigation in larger external cohorts. Integration of blood-based biomarker data and clinical/epidemiologic characteristics may ultimately facilitate development of multivariate classification models for accurate BC risk stratification and early diagnosis.

Conflicts of interest

The authors disclose no potential conflicts of interest.