Keywords

25.1 Introduction

The most commonly diagnosed cancer in women is breast cancer and the second most common cause of cancer-related deaths in women. Recent studies have highlighted the clinical, histological and molecular heterogeneity in breast cancer. These differences can occur in different patients with the same tumor type, different geographical regions of a tumor within the same patient, or over time within the same tumor of the same patient (Zardavas et al. 2015). Thus, understanding the comprehensive molecular landscape of the tumor is critical.

Comprehensive evaluation of hundreds to thousands of proteins simultaneously, proteomics , is now possible using cutting edge, high-throughput technology. The original definition and goal of proteomics was to study “all proteins expressed by a genome, cell or tissue” (Wilkins et al. 1996). In practice, this is not always possible and the words large-scale and global have been used very often to substitute for comprehensive (Zhao et al. 2015; Guo et al. 2015).

Historically, the evaluation of proteins originated with the combination of two-dimensional gel electrophoresis (2-DE) and mass spectrometry (MS). Today, the field of proteomics is still driven by the development of new technologies and improvement of existing technologies for sample processing, protein identification, and quantification to improve accuracy, scale, or throughput capabilities. In the current review, we focus on the major aspects of proteomics as applicable to breast cancer. The goal is to provide researchers with the basic tools to understand proteomic techniques and enhance the understanding of the molecular processes that are altered in breast cancer. This will enable the translation of these findings into the clinical field.

25.2 Types of Platforms for Multiplex Protein Profiling

The expression levels of multiple proteins can be measured by using different approaches (Fig. 25.1).

Fig. 25.1
figure 1

Methods for multiplex protein profiling. The expression levels of multiple proteins can be measured by using different approaches. Barcoding-NanoString combines digital detection (NanoString’s nCounter) with antibody-DNA conjugates. RPPA is a high-throughput antibody-based technique that uses colorimetric or fluorescent assay intensity. High-throughput antibody-free techniques consist of LC-MS/MS, which measures label-free peak peptide intensities, or stable-isotope labeling by tagging the mass of a protein or peptide. Recent modifications include SRM and MRM. Reprinted from Gokmen-Polar and Badve (2014)

25.2.1 Proteomic Technologies

According to the original or practical definition, any study that evaluates >10 proteins simultaneously should be considered as a proteomic study. The detection may include the protein identification and/or protein quantification. The analysis may include detection and quantification of the post-translational modifications of peptides/amino acids. Indeed, proteomics fundamentally answers “who” and “what” questions of many proteins in one project. The “who” is the name of a protein or the post-translational modification of an amino acid. The “what” is the change in expression of a protein or posttranslational modification (Fig. 25.2).

Fig. 25.2
figure 2

Applications of proteomic technologies for breast cancer diagnosis, classification and assessment of risk of cancer as well as prediction of recurrence

Functional proteomics further improves the knowledge of conventional proteomics in that it incorporates the examination of protein activation, protein–protein interactions and activated pathway analysis. Functional proteomics can be further divided into distinct subtypes based on the types of protein analyzed such as exosomal proteins (exosome), secreted proteins (secretome), proteases (proteasome), kinases (kinome) and phosphorylated proteins (phosphoproteomics). Innovative MS approaches such as 5-plex stable isotope labeling with amino acids in cell culture (SILAC), has already been applied to monitor phosphotyrosine signaling perturbations induced by a drug treatment in one single experiment.

The choice of the technique used depends on the goals of study and whether whole proteins or peptides are being detected. The availability of the identity of the proteins and their corresponding antibodies may enable the use of array-based techniques. However, these techniques require bait molecules, which increase the cost and raises issues related to reproducibility. On the contrary, the low sample consumption, reduced variability, and high-throughput capacity of MS platforms are significant advantages for discovery, but have issues related to reproducibility.

25.2.2 Array-Based Technologies

Array analysis is based on binding between a bait molecule and an analyte, which are subsequently detected by a probe. The bait molecule can be an antibody, protein, peptide, drug, nucleic acid, cell, phage, etc. The analyte is a protein. The probe is a molecule with a signal-generating moiety, such as a labeled antibody. The intensity of the signal is proportional to the quantity of an analyte bound to the bait molecule. An image of the spot pattern is captured, analyzed, and interpreted (Liotta et al. 2003).

According to whether the analyte is captured from the solution phase or bound to the solid phase, protein microarrays include two major classes: forward-phase arrays and reverse-phase arrays. In forward-phase arrays, the analyte is captured from the solution phase, and the bait molecule, such as an antibody, is immobilized onto the solid support. Antibody microarray is a forward-phase array in which a number of antibodies are arrayed. The array is incubated with the test sample (containing the analyte) for analysis.

In the reverse-phase protein array (RPPA), the analyte is bound to the solid phase and detected by the probe (Liotta et al. 2003). After sample lysates are spotted onto an array, the array is then hybridized with a specific antibody to recognize the protein of interest. The protein signal is amplified with a secondary antibody. The array is scanned and the resulting image is quantified and analyzed by an array software (Charboneau et al. 2002). RPPAs have been extensively used in the TCGA analysis.

Tissue microarrays (TMA) are also an antibody-based reverse-phase array, but named after the sample type. Tissue microarrays allow high throughput molecular profiling of markers in cancer specimens and rapid validation of novel potential candidates identified from proteomic analyses in a large number of tumor samples. For further details on the TMA (Badve DAKO paper).

25.2.3 Mass Spectrometry (MS) Based Methods

MS is an analytical tool that generates spectra of the masses of proteins within a sample. It first ionizes compounds to generate charged molecules and then measures their mass to charge ratios. The apparatus acts as a high-accuracy ion scale that is mostly composed of an ionizing source, an analyzer [quadruplope or TOF (time of flight)], and one or more detectors, which records the mass-to-charge ratio of the ionized peptides (Domon and Aebersold 2006). The spectra are examined to determine the elemental composition of the sample and the masses of proteins and to depict the chemical structures of the proteins.

The commonly used separation methods for whole-protein (top-down) analysis include classic gel based methods or high-performance liquid chromatography (HPLC) and MudPIT (multidimensional protein identification technology). Ionization techniques include electrospray ionization, surface enhanced laser desorption, and matrix assisted laser desorption ionization (MALDI). This data can be automatically submitted to a database for peptide mass fingerprint. Alternatively, tandem MS or MS/MS may be performed to obtain peptide sequence. Electrospray methods are being adapted for rapid diagnostic purposes such as margin assessment during surgery (Ifa and Eberlin 2016) for details about methodologies.

The bottom-up (or shotgun) methods involve tryptic digestion. This provides more information per protein as peptides are easier to ionize than proteins. A peptide ion provides useful information, including its intensity at each time point in the MS/MS spectrum. Using this information, different label-free methods have been developed, including spectral counting, ion intensity, MS/MS fragment ion intensity, and a combination of spectral counting and ion intensity measurements. The principle of spectral counting is very simple: the number of mass spectra identified for a protein is used as a measure of the protein’s abundance (Lundgren et al. 2010). It must be noted that the MS signal does not necessarily correlate with the abundance of the protein due to the variable ionization efficacy of proteins and peptides.

25.2.3.1 Label-Based and Label-Free MS Methods

Peptide centric proteomic approaches are broadly divided into isotope- and isobaric label based technologies (ICAT and iTRAQ, respectively) and label free MS-based proteomics . ICAT and iTRAQ methods have the potential for quantitative protein profiling of clinical samples, plasma and/or serum as well as tissues (Gromov et al. 2014). The ICAT platform has been used in conjunction with laser microdissection (LCM) in breast cancer (Zang et al. 2004). iTRAQ platform allows simultaneous assessment of differential abundance of proteins between several samples (up to 8) (Gromov et al. 2014). iTRAQ is still a discovery tool and the results need to be confirmed by other methods; SRM has been used for this purpose (Muraoka et al. 2012). Stable isotope labeling by amino acids in cell culture (SILAC) strategy is specifically tailored for detecting phosphoproteins. For example, this has been used by Tzouros et al. (2013) to identify 318 unique phosphopeptides belonging to 215 proteins from an erlotinib-treated breast cancer cell line model.

Label-free MS approaches allow for screening of proteomes on a global scale by quantitative measurement of peptide abundance by using peptide ion peak intensities or spectral counting without additional labeling of peptides. It is important to emphasize that the fold change of an individual peptide may be often different from the fold change for other peptides from the same protein. To detect and remove outlier peptides, multiple filters have been used to improve quantitation (Lai et al. 2011).

25.2.3.2 Selected and Multiple Reaction Monitoring (SRM and MRM)

Traditional label-free quantification methods quantify hundreds to thousands of proteins in a mixture. On the contrary, selected reaction monitoring (SRM) or multiple reaction monitoring (MRM), is a targeted protein quantification method. SRM/MRM is not a new mass spectrometry technique, but its application in proteomics is emerging as a complement to untargeted shotgun methods and is particularly useful in absolute quantification. When isotopically labeled peptides are used as internal standards, and SRM/MRM is able to absolutely quantify proteins (Chahrour et al. 2015). These methods have found commercial use and diagnostic and prognostic panels are available for clinical use. More specifically, tests for HER2 quantification as well as “comprehensive profiling” of tumors are offered by Nantomics.

25.3 Protein–Protein Interaction (PPI) Profiling

More than 80 % of proteins do not operate alone but in complexes (Berggard et al. 2007) so it is important to identify the interacting partners of proteins for deducing protein function (Phizicky et al. 2003). PPI can modify the kinetic properties of enzymes, act as mechanisms for substrate channeling, construct a new binding site for small effector molecules, inactivate or suppress protein, change the specificity of the protein for the substrate or serve as an upstream or downstream regulator of function (Phizicky et al. 2003). A number of in vitro, in vivo, and in silico methods are available to analyze PPIs (Rao et al. 2014). The in vitro methods include tandem affinity purification, affinity chromatography, co-immunoprecipitation, protein arrays, protein fragment complementation, phage display, X-ray crystallography, and NMR spectroscopy. In vivo methods include yeast-two-hybrid systems. A detailed discussion of these is beyond the scope of this chapter but a brief mention of fluorescence resonance energy transfer (FRET) will be made here.

FRET based methods have found their way in clinical practice because of the specificity of the reaction and the relative ease of analysis. The HERmark assay for HER2 is based on technology. Briefly, it is a proximity assay, which detects the binding of HER2 with its binding partners. Several studies have suggested that monitoring this interaction might be a better method for assessing HER2 activity (Duchnowska et al. 2012, 2014, 2015; Lipton et al. 2010).

25.3.1 Issues Related to Sample Preparation

Twenty-two most abundant blood-derived proteins constitute approximately 99 % of the total plasma protein mass. This makes it necessary to deplete these from clinical body fluid specimens in order to identify changes in less abundant proteins. Immunodepletion is a commonly employed technique (Prieto et al. 2014) to discover glycoproteins in breast cancer serum as biomarkers. This method has led to the identification of several biomarker candidates including thrombospondin-1 and 5, alpha-1B-glycoprotein, serum amyloid P-component, and tenascin-X (Zeng et al. 2011). These methods can be also applied to fresh frozen tissues.

In contrast to immunodepletion, pull-down technique selectively enriches a particular protein species and natural binding partners for the captured protein from a complex protein solution. It is particularly useful in determining protein–proteins interaction predicted by other research techniques or screening unknown protein–protein interactions.

Archival FFPE tissues require pretreatment to negate the effect of formalin fixation and processing. Detergent-based methods are commonly used to negate the effects of fixation; commercial kits such as Liquid Tissue® are also available for these purposes.

25.4 Applications

Excellent reviews summarizing the data from proteomics studies in relation to breast cancer have been published (Gromov et al. 2014; Lam et al. 2014; Zeidan et al. 2015). These reviews detail the methodology used for discovery, the type of samples and the technology used in the validation (if any) of the results. We shall highlight/summarize some of the critical studies below.

25.4.1 Biomarker for Breast Cancer Risk

The identification of biomarkers for the early detection of breast cancer has a major impact on reducing breast cancer mortality by removing the cancer early when it is most treatable. Because they can be monitored with minimal invasiveness, plasma biomarkers have additional value in early detection. Low abundance proteins in plasma collected from patients with stage I breast cancer or benign breast lesions have been enriched and analyzed using a proteomic approach, resulting in the identification of 397 proteins. Of these, 23 could be validated in an independent set of samples (Meng et al. 2011). Bohm et al. (2011) used an antibody microarray with 23 antibodies immobilized on nitrocellulose slides to determine the levels of acute phase proteins, interleukins and complement factors in the sera of 101 study participants (49 women with primary breast cancer and 52 healthy age-matched controls). Six proteins were found to be significantly different levels in breast cancer patients compared to healthy subjects. Garrisi et al. (2013) analyzed 292 serum samples (100 from healthy people, 100 from sporadic breast cancer patients, and 92 from familial breast cancer patients) to identify significant differentially expressed peptides.

In a tissue based approach, Chung et al. identified ubiquitin and S100P as differentially expressed in 82 breast cancer and 82 adjacent unaffected tissue samples (Chung et al. 2013). They confirmed the differential expression in an independent cohort of 89 patients. Proteomics of breast cancer-associated adipose tissue from freshly isolated tumors enabled the identification of paracrine secretion of oncostatin M by cancer-associated adipose tissue (Lapeire et al. 2014). Oncostatin M is known to phosphorylate STAT3 and induce transcription of several STAT3-dependent genes. Selective inhibition of oncostatin M by neutralizing antibody and Jak family kinases by tofacitinib inhibited STAT3 signaling, peritumoral angiogenesis, and cellular scattering (Lapeire et al. 2014).

Martinez-Lozano Sinues (2015) performed breath analysis in a cohort of 14 breast cancer patients and 11 healthy volunteers using secondary electrospray ionization-mass spectrometry (SESI-MS) to detect a cancer-related volatile profile. Supervised analysis of breath data identified a support vector machine (SVM) model including 8 features corresponding to m/z 106, 126, 147, 78, 148, 52, 128, 315 and able to discriminate exhaled breath from breast cancer patients from that of healthy individuals, with sensitivity and specificity above 0.9. Zhu et al. (2015) used electrospray ionization-linear ion trap quadrupole mass spectrometry (ESI-LTQ-MS) and liquid chromatography / electrospray ionization tandem mass spectrometry (LC/ESI-MS/MS) to determine the structure of a glycosphingolipids (α1,2 fucosidase and fucosyltransferases) in human breast cancer tissue. They identified the ion with m/z 1184 molecular ion as fucosyl-lactoceramide (Fuc-LacCer) was specific to breast cancer.

25.4.2 Biomarker for Classification

Brozkova et al. (2008) have used SELDI-TOF analysis of tumor tissue lysates to reproduce the DNA-based intrinsic classification of breast cancers . Tissue proteomic approaches have been used to determine prognosis in ER+, HER2+ and TNBC patients (Lam et al. 2014). In a number of these studies, validation has been performed using Western Blots or IHC. However, the endpoints have been correlation with histology or IHC; the clinical significance of these classifiers has not always been analyzed. Similarly, RPPA data has been shown to be consistent with HER2 by IHC in a number of studies (Wulfkuhle et al. 2012). In the TCGA breast cancer cohort, RPPA analysis identified two novel protein expression-defined subgroups within the luminal tumors, possibly produced by stromal / microenvironmental factors. Seven clusters were identified in this TCGA analysis (HER2, Luminal A, Luminal A/B, X, reactive I and reactive II) (Cancer Genome Atlas 2012). Gujral et al. (2013) used RPPA to analyze 56 breast cancers and matched normal tissues using 71 signaling proteins. Using unsupervised hierarchical clustering, they were able to identify 12 clusters each composed of important signaling pathways that could be used for drug targeting.

A number of proteomics studies have correlated protein expression patterns with tumor stage (Villanueva et al. 2006; Li et al. 2002, 2005; Laronga et al. 2003). These studies have identified C-terminal truncated fragment of complement C3, FPA, fibrinogen, ITIH4, apoA-IV, bradykinin, factor XIIIa and transthyretin to be associated with stage (Gromov et al. 2014). Sonntag et al. (2014) have reported the use of proteomic signature composed of Caveolin-1, NDKA, RPS6, and Ki67 for prognostication and have resolved grade II patients into 2 subsets depending on their similarity to grade 1 or grade 3 tumors.

Recent years have seen the application of whole protein analysis in breast cancer including for margin assessment (Eberlin et al. 2011). Calligaris et al. (2014) used desorption electrospray ionization mass spectrometry imaging (DESI-MSI) for identifying and differentiating tumor from normal breast tissue. Several fatty acids, including oleic acid, were more abundant in the cancerous tissue than in normal tissues. Tumor margins were identified using the spatial distributions and varying intensities of different lipids were consistent with those margins obtained histology. They suggest the use of this method for the rapid intraoperative detection of residual cancer tissue during breast-conserving surgery.

25.4.3 Biomarker for Prognostics

UPA/PAI-1 is a well validated marker that has high levels of evidence for clinical use in breast cancer (Duffy et al. 1988a, b). It is also one of the few markers included in the ASCO biomarker guidelines based on the ELISA data confirmed using clinical trial materials (Harbeck et al. 2013).

Quiescin sulfhydryl oxidase 1 (QSOX1) has been documented to be useful in predicting relapse in Luminal B tumors (Katchman et al. 2013). Ferritin light chain levels have been correlated with node negative status (Descotes et al. 2012) and DCN and HSP90B1 levels with increased likelihood of metastases and poor overall survival (Cawthorn et al. 2012). He et al. (2013) used a 2D-LC coupled with HPLC-CHIP MS/MS approach to analysis of samples from LN+ER/PRHER2+ (n = 50) and LNER/PR+HER2 (n = 50) breast cancer patients. Of the 118 proteins differentially expressed, they were able to confirm the presence of an immune-related protein, serum soluble CD14 (sCD14) as a biomarker. High level of serum sCD14 at primary surgery was confirmed in an independent cohort of 183 breast cancer patients (90 LN+ER/PRHER2+ and 93 patients with LNER/PR+HER2) to be associated with a significantly lower risk of relapse in 3 years. Naba et al. (2014) analyzed the extracellular matrix of human mammary carcinoma xenografts shows that primary tumors of differing metastatic potential differ in extracellular matrix composition. They confirmed that the mRNA levels of the identified targets (SNED1 and LTBP3) had prognostic relevance using an online Affymetrix microarray database.

Gonzalez-Angulo’s group have profiled a large number of tumors with 146 antibodies (RPPA) to identify 6 clusters of breast tumors using a 10 protein panel (Hennessy et al. 2010; Gonzalez-Angulo et al. 2011, 2013; Sohn et al. 2013). These 10 proteins (ER, PR, BCL2, GATA3, CCNB1, CCNE1, EGFR, HER2, HERp1248 and EIG121) were shown to be useful in predicting the relapse free survival (RFS) in patients who underwent neoadjuvant chemotherapy.

25.4.4 Biomarker for Treatment Response Prediction

Majidzadeh and Gharechahi (2013) used plasma proteome signatures of 9 proteins to define a group of patients likely to have/develop tamoxifen resistance. The MD Anderson group has also reported that a panel of 3-proteins (CHK1pS345, Caveolin-1 and RAB25) could predict RFS after neoadjuvant system therapy. Yang et al. (2012) analyzed by mass spectrometry needle biopsies of tumor from patients prior to neoadjuvant (Doxorubicin-based) chemotherapy. Among 298 differentially expressed proteins (>1.5-fold) FKBP4 and S100A9 were validated by IHC as useful for predicting resistance to therapy.

25.5 Challenges to Proteomics

There are major advantages for the use of proteins as biomarkers for disease as they are the workhorses within the cellular environment. However, there are several limitations. Proteins, unlike DNA or RNA, cannot be amplified. Approximately 500,000 to 1,000,000 proteins are synthesized from the 35,000 genes in the human genome through processes of alternative splicing and posttranslational modifications. This makes identification of the structure critical. Most of the high-throughput techniques are based on peptide digestion and not intact proteins. Deciphering the identity of the protein can thus be challenging. Proteins/peptides having a mass between 4000 and 10,000 Da are difficult to identify. In addition, in most tissue/blood samples a small number of proteins account for the vast quantity of proteins detected. For example, approximately 20 proteins constitute more than 98 of the proteins identified in serum/plasma (Omenn et al. 2005; Anderson et al. 2004; Anderson and Anderson 2002). Detection of low abundance proteins is a major challenge that requires the use of depletion of major species or enrichment of rare proteins by variety of methods including fractionation. Protein expression can be transient in nature. This, in addition, to pre-analytical handling of the specimens can introduce significant reproducibility issues. The assays themselves are also not very reproducible and there can be significant variability between experiments resulting in descriptive studies.

The costs of proteomics studies is still fairly high resulting in studies that are composed of low number of samples. It is not always clear whether the differences noted in the studies are due to analytical system or low sample size, or due to tumor heterogeneity. The specimens used are often “samples of convenience” and lack detail annotations. Comparisons are often performed using surrogate variables such as histology or IHC rather than patient outcomes such as overall and disease free survival. In addition, in most studies the differences in the quantitative, the proteins/peptides are not exclusive to the disease state.

25.6 Conclusions

Proteomics has the capacity to help clinicians or scientists answer clinically and biologically relevant questions. These may involve the use of whole protein or peptide based analyses of cells, cell fractions or body fluids aided as necessary by fractionation and pull-down techniques. There is enormous scope for the use of these as biomarkers for early detection, diagnostics, classification, treatment response prediction, and prognostics, and for understanding mechanisms involved in cell proliferation, motility and survival. RPPA has been very successfully employed in multi-institutional studies such as the TCGA. Techniques that help elucidate protein–protein interactions are critical for defining molecular pathways; some of these have been also put towards clinical use. However, there is significant need for development of new technologies and improvement of existing technologies for sample processing, protein identification, and quantification to improve accuracy, scale, or throughput capabilities. These developments would lead to accurate identification of proteins and their isoforms as well as make the quantification more precise. Cost of analyses remain high and in many cases, prohibitive to large scale experimentation.