Keywords

Proteins are the essential mediators of cellular function: Their biological activities and interactions catalyze biochemical reactions and thereby facilitate physiological and pathological processes. Characterization of the functional state of proteins as well as measuring changes in protein abundances reveals fundamental insights into these processes and ultimately deepens our understanding of the underlying biochemistry and molecular biology. While gene expression analysis by real-time PCR or via transcriptome sequencing provides valuable insights into biological pathways, it can only infer protein abundance information. There is a growing consensus that correlation between mRNA and protein levels is in general modest (Gygi et al. 1999b; Schwanhäusser et al. 2011; Skelly et al. 2013; Lundberg et al. 2010). The protein phenotype appears to be buffered against transcriptional variation (Fu et al. 2009) Correlations of transcripts and proteins depend on cellular location and biological function (Conrads et al. 2005) and are controlled by tissue-specific post-transcriptional regulation (Franks et al. 2017). Therefore, direct measurements of proteins are preferable since they will more accurately reflect cellular status and provide insights into the molecular mechanisms that underlie physiological and pathological processes. Mass spectrometry-based proteomics has emerged as the method of choice for the identification, characterization, and quantification of proteins (Picotti et al. 2013; Aebersold and Mann 2016). Protein identification and characterization is critical to identify alternatively spliced proteins, proteolytic processing, and post-translational modifications that alter the composition and functional status of proteins at the post-transcriptional level. It is estimated that the diversity of the roughly 20,300 protein-coding genes is increased to over 500,000 proteoforms by alternative splicing and post-translational modifications (phosphorylation, glycosylation, proteolytic truncations) (Smith et al. 2013).

The ability to identify proteins at a large scale has been primarily driven by the advances in mass spectrometric instrumentation, informatic workflows, and separation of complex protein mixtures. Liquid chromatography (LC) and two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) are two of the most commonly applied separation techniques prior to mass spectrometric analysis. Quantitative proteomics has developed into an indispensable tool for cancer research to analyze disease-related tissues and body fluids in order to identify proteins, protein post-translational modifications, or protein complexes that can be used to detect the disease early, prognose disease outcome, and monitor response to therapeutic intervention and for the elucidation of molecular mechanisms for the development of novel therapeutics. Oncoproteomics has been extensively reviewed, from proteomic studies of tumor tissue and cancer cell lines to profiling of plasma and other body fluids for cancer biomarkers (Huang et al. 2017; Belczacka et al. 2018; Tan et al. 2012; Cantor et al. 2015; Veenstra 2013; Faria et al. 2017). Here, we highlight the most promising quantitative proteomics approaches in the context of studying cancer signaling pathways.

4.1 Differential Analysis by 2D-PAGE

In 2D-PAGE, proteins are initially resolved by isoelectric focusing followed by a separation based on molecular mass. After protein staining, specialized image analysis software is used to identify differentially expressed protein spots. Spots of interest are excised, and proteins are in-gel digested with exogenous proteases (i.e., trypsin). The resulting peptides are recovered and their molecular masses measured by mass spectrometry. In the early stages of proteomics, subsequent peptide identification was performed by peptide mass fingerprinting (PMF) using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), in which peptide masses were matched against theoretically predicted peptide masses in a database of candidate proteins (Monteoliva and Albar 2004). Nowadays, with the ubiquity of LC-MS/MS instrumentation, protein identification is typically performed in higher throughput and with higher accuracy through the matching of peptide fragmentation data obtained from tandem mass spectrometry experiments (MS/MS) to theoretically predicted peptide fragment masses. 2D-PAGE is a classical proteomics workflow that provides a straightforward visual and quantitative comparison of differences in protein composition. It has been extensively employed in cancer research, including for the detection of tumor-associated proteins in colorectal cancer tissue samples (Wang et al. 2007; Stulík et al. 1999; Xing et al. 2006) and in the combination with laser capture microdissection (LCM) (Shi et al. 2011). However, the application of 2D-PAGE has been declining in recent years due to its limitations in throughput and reproducibility, sensitivity, dynamic range, and its laborious nature (Belczacka et al. 2018). The detection limits of the most commonly used stains range from 500 ng/mm2 (colloidal Coomassie Brilliant Blue) to 0.1 ng/mm2 (silver stain and fluorescent dyes). Some of these drawbacks can be overcome by the usage of narrow-range IPG strips to increase the resolving power in the initial isoelectric focusing dimension. To increase reproducibility and improve comparative analyses, difference gel electrophoresis (DIGE) was developed, a form of multiplexed 2D-PAGE where up to three different protein samples are fluorescently labeled prior to gel separation. DIGE-based differential proteomics analysis has been successfully used in the discovery phase of cancer biomarker studies (colorectal, prostate cancer) when the proximal tissue samples are being analyzed at greater depth before validation of potential markers by ELISA in serum (Hamelin et al. 2011; Pang et al. 2010).

A particular strength of 2D-PAGE is the ability to detect and visualize proteoforms – the different molecular structures that the protein products of a single gen can assume due to genetic variations, alternatively spliced RNA transcripts, and post-translational modifications (PTMs) (Smith et al. 2013). PTMs including proteolytic processing, deamidation, glycosylation, acetylation, alkylation, cysteine oxidation, tyrosine nitration, and phosphorylation regulate many cellular signaling pathways. PTMs alter the molecular mass and/or the isoelectric point of the protein. For example, different phosphorylation states of a protein are observable as horizontal spot trains in a 2D gel, whereas glycosylation can alter both the pI and the molecular weight of proteins resulting in clusters shifted both horizontally and vertically (Löster and Kannicht 2008). ProMoST (http://proteomics.mcw.edu/promost.html) is a webtool that can be used to calculate gel shifts introduced by PTMs to facilitate more detailed analyses (Halligan et al. 2004).

4.2 Top-Down Proteomics

Ideally, intact proteins would be analyzed directly by mass spectrometry without the need for proteolytic digestion; protein identification, in turn, would be achieved by MS/MS fragmentation of the whole protein. As a liquid phase-alternative to 2D-PAGE, “top-down” proteomics has made substantial progress in the last decade, and it is now feasible to measure over 3000 proteoforms using a four-dimensional LC separation system that is integrated with high-resolution electrospray mass spectrometry analysis (Tran et al. 2011). However, similar to 2D-PAGE, the high complexity and dynamic range of protein concentrations encountered in proteome research currently limit the applicability of the top-down approach to large-scale discovery analyses. Nonetheless, “native mass spectrometry” experiments in which biological analytes are ionized by electrospray from nondenaturing solvents to preserve noncovalent interactions in the gas phase, have been used to analyze specific macromolecular assemblies including protein-protein and protein-ligand complexes (Hernández and Robinson 2007; Zhou et al. 2011; Leney and Heck 2017).

4.3 Bottom-Up (Shotgun) Proteomics

Instead of top-down intact protein analysis, “bottom-up” proteomics has been the more practical approach and has been widely adapted in the field (Aebersold and Mann 2016). In the bottom-up strategy, peptides are generated by enzymatic digestion of proteins with sequence-specific, exogenous proteases such as trypsin. The resulting peptides are separated by reversed-phase liquid chromatography (LC) and injected into the hyphenated tandem mass spectrometer. Peptides are isolated in the gas phase and subjected to fragmentation, thereby generating tandem mass spectra (MS/MS; MS2). Collision-induced dissociation (CID) and higher-energy collisional dissociation (HCD) are two of the most commonly used fragmentation techniques to generate sequence information for peptide identification. Electron-transfer dissociation (ETD) and electron-capture dissociation (ECD) can be useful alternative strategies for the identification of larger and post-translationally modified peptides. Post-translational modifications (PTMs) such as phosphorylation and glycosylation are labile and readily lost over peptide backbone fragmentation (Mikesh et al. 2006; Syka et al. 2004; Zubarev et al. 2000). The resulting MS/MS fragmentation data are submitted to database search engines (i.e., MASCOT (Perkins et al. 1999), SEQUEST (Eng et al. 1994), X!Tandem (Craig and Beavis 2004), MyriMatch (Tabb et al. 2007), and OMSSA (Geer et al. 2004)) for protein/peptide identification. These search engines match and score the empirically acquired spectra against theoretically predicted fragmentation patterns of peptides derived from in silico digestions of proteins stored in protein sequence databases (Nesvizhskii 2010; Eng et al. 2011). Alternatively, MS/MS spectra can be matched via correlation analysis to previously observed and identified spectra using spectral library search engines such as SpectraST (Lam et al. 2007), X!Hunter (Craig et al. 2006), and BiblioSpec (Frewen et al. 2006). Though spectral library searching is typically considered to be a more sensitive approach than sequence database searches, its adaption in the field has been fairly limited (Deutsch et al. 2018). PeptideAtlas (Desiere et al. 2004), the Global Proteome Machine Database (Craig et al. 2006), and the MassIVE Knowledge Base (Wang et al. 2018) are efforts to leverage the large number of peptide identifications contained in public proteomics datasets to create spectral library resources that can support future proteomics experiments.

Peptide sequences can also be derived from MS/MS fragmentation data by de novo sequencing approaches using algorithms including PEAKS (Ma et al. 2003), PepNovo (Frank and Pevzner 2005), Novor (Ma 2015), and Lutefisk (Taylor and Johnson 2001) that do not rely on reference databases (Allmer 2011). De novo sequencing frameworks designed for top-down proteomics can be advantageous in the analysis of high-resolution bottom-up MS/MS datasets (Vyatkina et al. 2017).

Combining the results of multiple search engines with tools such as iProphet (Shteynberg et al. 2011) can improve the confidence of peptide-spectrum matches and increase the overall number of distinct peptides and proteins identified since each search engine has its own specific strengths which can be complementary to others (Shteynberg et al. 2013).

Currently, there are two major data acquisition strategies used in bottom-up proteomics: The preferred method for proteome discovery is data-dependent acquisition (DDA), which aims to maximize the number of protein and peptide identifications per experiment to achieve comprehensive proteome coverage. Hallmarks of this approach include the 1-h yeast proteome (Hebert et al. 2013b) and draft maps of the human proteome with coverages of up to 92% of the protein-coding sequences (Wilhelm et al. 2014; Kim et al. 2014). To achieve this level of proteome coverage, additional fractionation techniques (strong anion exchange; off-gel electrophoresis) were employed to distribute sample complexity across additional data acquisitions. Applied on single cell lines (i.e., HeLa human cervical carcinoma), over 10,255 proteoforms stemming from 9205 genes can be identified by deep proteomics analysis (Nagaraj et al. 2011). Proteomics analyses of a panel of 11 commonly studied cell lines (Geiger et al. 2012) and the NCI-60 panel of 59 cancer cell lines (Gholami et al. 2013) suggests that at least ~10,000 proteins are about the average proteome coverage of a human cell line. A more recent study showed that adding an off-line high pH peptide fractionation step prior to low pH LC-MS/MS analysis can deepen the protein coverage even further to over 12,000 proteins for HeLa cells (Bekker-Jensen et al. 2017). A key strength of the described DDA methods is the fact that no a priori knowledge about the identity of the expected proteins is required and therefore unanticipated proteins and PTMs can be discovered, potentially providing new biological understanding.

Data-independent acquisition (DIA) also referred to as SWATH (Sequential Windowed data-independent Acquisition of Total High-resolution) is a more recently developed methodology that aims to obtain complete fragment ion coverage across samples (Ludwig et al. 2018). In DDA experiments, a full precursor ion spectrum of all co-eluting peptides is acquired at the MS1 level, after which as many as possible precursor peptides are isolated, fragmented, and MS2 spectra acquired within the instrument cycle time. In DIA experiments by contrast, predetermined windows of m/z values are sequentially isolated for fragmentation (Gillet et al. 2012). In each instrument cycle, the entire precursor ion m/z range gets fragmented, resulting in highly multiplexed fragment ion spectra. Precursor-fragment ion relationships can be reconstructed with bioinformatic tools such as DIA-Umpire (Tsou et al. 2015, 2016) to create “pseudo”-spectra that are conventionally searched against protein databases to create internal spectral libraries that contain peptide identifications. These internal spectral libraries or external spectral libraries built from DDA data are then used to perform targeted extraction (Röst et al. 2014). Key advantage of the DIA approach is its unbiased nature: All precursor and all fragment ions are acquired all the time without losing low abundant ions; the identities of quantified peptides do not need to be specified a priori, which is ideal when the data is acquired over the course of a multi-year study. DIA measurements comprise an archival record of the sample content that can be re-interrogated when new proteins, proteoforms, or post-translational modifications sites of interest emerge.

4.4 Relative Quantitation in Bottom-Up Proteomics

In bottom-up proteomics, quantitation is achieved by either label-free or stable isotope labeling methods (Bantscheff et al. 2012). Stable isotope-based methods are the gold standard for quantification; however they require metabolic labeling or an additional chemical labeling step during sample preparation. Label-free approaches are simpler and more economical, providing relative quantitation for an unlimited number of samples (including clinical specimens) and can be based on either DDA or DIA datasets (Nahnsen et al. 2013). State-of-the-art mass spectrometers provide the necessary high mass resolution and high mass accuracy that are required for the accurate extraction of ion chromatograms (XICs; elution profiles) of precursor ions at the MS1 level that are used to determine peptide quantities. In the past, when bottom-up proteomics was mostly performed on low-resolution ion trap instruments, the number of identified MS/MS spectra for a given peptide (spectral counts) was used as a surrogate measurement for peptide abundance (Ishihama et al. 2005). While the spectral count approach has been used to create one of the drafts of the human proteome (Kim et al. 2014), XIC-based approaches are now the most commonly employed label-free methodology due to their superior sensitivity. By aligning the retention times of XIC areas and propagating MS/MS-based peptide identifications across data acquisitions (“matching between runs”), the overall number of detectable peptides between samples can be boosted which leads to more comprehensive comparative analyses (Bateman et al. 2013). Numerous academic and commercial proteomics data analysis packages including PEAKS (Ma et al. 2003) and Scaffold (Searle 2010) offer label-free quantitative workflows in addition to their identification pipelines (Nahnsen et al. 2013; Mueller et al. 2008). Particularly noteworthy is the continuously expanding proteomics software tool suite under the MaxQuant umbrella which is freely available and has become one of the most widely used proteomics data analysis platforms. MaxQuant incorporates the peptide database search engine Andromeda (Cox et al. 2011) and the MaxLFQ workflow for label-free quantitation (Cox et al. 2014) and supports as well other MS1- and MS2-level (isobaric) labeling approaches (Tyanova et al. 2016).

In contrast to the stochastic precursor ion selection in DDA, DIA systemically parallelizes the fragmentation of all detectable ions, thereby minimizing selection bias, which in turn results in improved dynamic range and sensitivity. Specific peptides can be identified and quantified by applying targeted extraction of either MS1 precursor or MS2 fragment ion intensities using spectral library-based OpenSWATH (Röst et al. 2014), Skyline (Maclean et al. 2010), or commercial software (PeakView SWATH 2.0, SCIEX; Spectronaut, Biognosys). The performance of these “peptide-centric” query tools in terms of identification precision, robustness, and specificity has been benchmarked against reference datasets and compared to the “data-centric” DIA-Umpire approach (Tsou et al. 2015) that does not rely on existing assay libraries (Navarro et al. 2016). Targeted extraction relies on the generation of sample-specific assay libraries that contain precursor and fragment ion m/z values, normalized retention times, and relative ion intensities of targeted peptides. Retention times are typically normalized using a set of reference peptides (Escher et al. 2012). DIA studies often rely on sample-specific libraries that are acquired on the same instrument in DDA mode prior to the DIA analysis (Gillet et al. 2012; Röst et al. 2014; Hüttenhain et al. 2013). Alternatively, repositories of assay libraries for human proteins have been created that are optimized for specific MS instruments. These resources contribute to simplified and reproducible targeted SWATH/DIA analysis across laboratories (Rosenberger et al. 2014). A multi-laboratory evaluation study across 11 sites demonstrated that SWATH acquisitions are capable of reproducibly detecting and quantifying a large-scale protein set (Collins et al. 2017).

4.5 Multiplexed Quantitation Using Stable Isotope Labeling Methods

The analysis of cancer signaling networks requires the ability to quantify proteins across multiple conditions so that temporal dynamics can be captured. A broad variety of chemical and metabolic stable isotope labeling methods have been developed that allow for multiplexing (Gevaert et al. 2008). Stable isotope labeling strategies can provide relative and absolute quantitation; however, the specifics of the labeling reactions can limit the number of samples that can be interrogated in contrast to label-free approaches. Isotope-coded affinity tags (ICAT) are one of the first stable isotope chemical labeling reagents that became widely adapted in proteomics (Gygi et al. 1999a). ICAT reagents are comprised of a reactive group specific toward cysteinyl residues, a stable isotope label (heavy/light), and a biotin affinity tag for selective enrichment to reduce sample complexity. ICAT allows for the duplex analysis for comparison of protein levels across two biological states. The exclusive reliance of ICAT on cysteine-containing peptides limits its general applicability as quantitation approach, and it has been mostly replaced by a new generation of isobaric labeling strategies based on N-hydroxysuccinimide (NHS) chemistry. The TMT (tandem mass tag) (Thompson et al. 2003) and iTRAQ (isobaric tags for relative and absolute quantitation) (Ross et al. 2004) labels share isobaric stable isotope moieties as design features, which render differentially labeled samples “silent” – indistinguishable during chromatographic separation and in precursor MS1 acquisition. Only upon MS/MS fragmentation the low molecular weight reporter ions are released, and their relative ion abundances are used for quantitation. Currently, there are up to eight reporter ions available for iTRAQ (Choe et al. 2007) and up to ten for TMT (Erickson et al. 2017), each allowing for multiplexed analysis in single LC-MS/MS experiments. For projects entailing larger sample numbers, one of the isotope channels is typically used for a control reference mixture.

The dynamic range of isobaric multiplex quantitation methodologies can be limited by isotopic contamination, background interference, low signal-to-noise ratio, and ratio compression (Ow et al. 2009; Karp et al. 2010). Applying an additional isolation and fragmentation event (MS3 scan) (Ting et al. 2011) and gas-phase purification through proton transfer ion-ion reactions (Wenger et al. 2011) has been shown to eliminate interferences. Co-isolating and co-fragmenting of multiple MS2 fragments (MultiNotch MS3) can boost sensitivity and improve the dynamic range of the isobaric tagging approach (Mcalister et al. 2014).

Dimethyl labeling using different isotopomers of formaldehyde provides a more economical triplex stable isotope quantitation method at the peptide level (Boersema et al. 2008). Chemical isotope labels are typically introduced late in the sample preparation process, which makes these labeling strategies broadly applicable; however, at the same time, they are more susceptible to variability introduced during processing.

SILAC (stable isotope labeling by amino acids) is a metabolic labeling method alternative to chemical isotope tags (Mann 2006). SILAC relies on the in vitro incorporation of essential amino acids that feature substituted stable isotope nuclei (e.g., Arg or Lys labeled with13C or15N). SILAC labeling is insensitive to variability introduced at the sample processing and analysis stage since all sample handling issues affect all proteins and peptides equally. SILAC and15N metabolic labeling has been used for comparative proteomics analysis in cell culture systems (Ong et al. 2002; Everley et al. 2004, 2006) and model organisms including yeast (de Godoy et al. 2008), C. elegans and D. melanogaster (Sury et al. 2010), and rodents (Kruger et al. 2008; Wu et al. 2004). Full incorporation into the entire organisms requires feeding more than one generation exclusively with the essential, stable isotopically labeled lysine amino acids. A comprehensive analysis employing triple SILAC-based proteomics (using Arg0, Lys0; Arg6-L-13C6 and Lys4-L-2H4; Arg10-L-13C615N4 and Lys8-L-13C615N2), RNA-seq-based transcriptomic profiling, and antibody-based confocal microscopy revealed that three functionally different human cancer cell lines shared expression levels for more than half of their expressed genes, while close to 20% were substantially altered (Lundberg et al. 2010).

In the super-SILAC method, lysates from multiple SILAC-labeled cancer cell lines are combined to serve as internal, isotopically labeled peptide standards to measure fold change ratios between human tumor proteomes (Geiger et al. 2010). By combining SILAC and TMT labeling in the same experiment, a strategy termed “hyperplexing,” it is possible to extend the number of samples that can be quantified in the same LC-MS run (Dephoure and Gygi 2012).

The advent of mass spectrometers capable of ultra-high mass resolution (>200,000) made it possible to reveal the small mass differences (milliDaltons) introduced by the differences in the neuron-binding energetics of isotopes such as2H (+ 1.0062),13C (+ 1.0034), and15N (+ 0.997). The neuron encoding (NeuCode) method (available as amine-reactive labels and SILAC reagents) takes advantage of the ability to embed these mass defect-based neutron signatures into isotopologues. At standard resolution, these isotopologues are concealed during MS1 and MS/MS analysis and therefore do not increase spectral complexity (Hebert et al. 2013a). The multiplexed quantitative information is only revealed at high-resolution scans. NeuCode is applicable to DDA (Overmyer et al. 2018) and DIA approaches (Minogue et al. 2015) as well as targeted proteomics (Potts et al. 2016) and top-down applications (Rhoads et al. 2014; Shortreed et al. 2016).

4.6 Quantitation by Targeted Proteomics

Targeted proteomics provides accurate and quantitative measurements of protein abundances and thereby enables hypothesis-driven research using mass spectrometry (Picotti et al. 2013). In contrast to DDA- and DIA-based proteomics analyses, the identities of the proteins of interest are known a priori in targeted proteomics experiments. For any given protein, peptides are selected that are “proteotypic,” meaning that each peptide has a unique sequence, is readily detected by MS, and has been repeatedly and consistently identified in previous studies (Mallick et al. 2007). By selectively subjecting these proteotypic peptides to precursor ion isolation and continuous fragmentation, characteristic fragment (product) ion abundances for the most intense transitions can be recorded over the chromatographic elution profile, and this information is then used to estimate relative protein abundances. These types of experiments are typically performed on triple quadrupole instruments operating in multiple reaction monitoring (MRM) mode, which is also referred to as selected reaction monitoring (SRM). To increase specificity, typically multiple product ions are measured. Absolute protein abundances can be determined by using spike-in, isotopically labeled reference peptides (Gerber et al. 2003) or mTRAQ chemically labeled standards (Desouza et al. 2008) or in label-free format when anchor proteins are used to create a quantitation model (Ludwig et al. 2011). An efficient method to define custom MRM assay conditions in high-throughput format is through the usage of crude synthetic peptide libraries (Picotti et al. 2010). To achieve proteome-wide coverage for absolute protein quantification, an in vitro protein expression system has been used to synthesize over 18,000 recombinant proteins from full-length human cDNA libraries, which were then digested and labeled with mTRAQ (Matsumoto et al. 2017). Alternatively, ProteomeTools is a brute force project to create a resource comprised of the comprehensive LC-MS analysis of over 1.4 synthetic million peptides that cover tryptic and non-tryptic peptides representative of the canonical human proteome, as well as additional peptides covering splicing variants, post-translational modifications, and other sequences representing interesting biology such as disease-associated mutations (Zolg et al. 2017).

Compared to shotgun proteomics approaches, MRM assays provide higher sensitivity, specificity, and a broad dynamic range. Once established, individual MRM assays can be multiplexed at the peptide level (Picotti and Aebersold 2012). Measurements have been shown to be highly reproducible across laboratory sites (Addona et al. 2009). SRMAtlas (www.srmatlas.org) and PASSEL (www.peptideatlas.org/passel) both host freely accessible proteome-wide assay libraries along with empirical performance data that facilitate the design of targeted MRM assays (Farrah et al. 2012; Kusebauch et al. 2014, 2016). MRM assays for 1157 cancer-associated proteins have been developed, of which 182 were detected in depleted plasma and 408 in urine across a cohort of cancer patients and healthy controls using a label-free MRM strategy (Hüttenhain et al. 2012).

By combining peptide immunoaffinity enrichment with stable isotope-labeled standards and MRM-MS, it is possible to create automated, multiplexed assays with sufficient sensitivity to quantify low-abundance target proteins in plasma as an alternative to traditional enzyme-linked immunosorbent assay (ELISA)-based testing (Whiteaker et al. 2010).

The advent of high-resolution/accurate mass (HRAM) instrumentation has enabled the development of the parallel reaction monitoring method (PRM), in which the monitoring of a single product ion in an MRM assay is substituted with the parallel detection of all target product ions in a high-resolution MS/MS analysis (Peterson et al. 2012; Bourmaud et al. 2016). While MRM and PRM provide the best quantitation performance, both are throughput limited in terms of how many proteins can be quantified in a single MS experiment. SWATH/DIA provides a compelling alternative for reproducible quantitation in which a targeted data analysis strategy is employed to extract specific fragment ion abundances out of the comprehensive fragment ion map provided by the DIA dataset. Similar to MRM/PRM, reference libraries containing SWATH assay conditions can be built (Schubert et al. 2015) and shared via repositories (Rosenberger et al. 2014). SWATH/DIA assays have been shown to perform well across multiple laboratory sites (Collins et al. 2017). Additional throughput can be achieved for targeted proteomics assays when multiplexing is extended to the sample level by utilizing isobaric labels. In the TOMAHAQ method, synthetic TMT0-labeled spiked-in peptides trigger the MultiNotch MS3 acquisition of co-eluting TMT10-labeled endogenous peptides, which allowed for the quantitation of 69 target proteins across 180 cancer cells within 48 h (Erickson et al. 2017). The setup and data analysis for this approach have been simplified by the recent development of the TomahaqCompanion tool (Rose et al. 2018).

By carefully selecting protein targets based on their involvement in particular biochemical pathways, it is possible to quantitatively investigate the response of cellular systems to external stimulation (Matsumoto and Nakayama 2018). The multiplex MRM approach has been used to study the protein expression in major metabolic energy pathways of breast cancer cells in response to hypoxia, glucose deprivation, and estradiol stimulation (Drabovich et al. 2012; Murphy and Pinto 2010). Leveraging their in vitro proteome-assisted MRM assay library (iMPAQT) that covers over 18,000 proteins, Matsumoto et al. were able to explore the global impact of oncogenic transformation on fibroblasts (2017). Alternatively, by integrating detailed information about biological processes on the basis of literature evidence and computational predictions, it is possible to carefully select protein quantitation targets that can serve as sentinels or proxies for system responses (Soste et al. 2014).

4.7 Characterization of Post-translational Modifications

With continued improvements in mass accuracy, resolution, and sensitivity of mass spectrometry instruments, proteomic expression analyses feature deeper proteome and higher protein sequence coverages that enable more exhaustive characterizations of post-translational modifications (PTMs). PTMs including phosphorylation, glycosylation, and ubiquitination are important modulators of protein function: For example, most proteolytic enzymes are activated from their inactive precursor (zymogen) state by proteolytic cleavage (Klein et al. 2017). Many phosphorylations lead to protein conformational changes that modulate protein activity, i.e., protein binding. Ubiquitination marks proteins for degradation. Glycosylation often regulates protein function and enzymatic activities, alters protein-protein interactions, and changes the subcellular localization of numerous proteins. In mass spectrometric analyses, most PTMs lead to characteristic mass shifts in MS1 spectra, and their location on specific amino acid residues can be determined by fragmentation analysis. However, the combinatorial nature of post-translational modifications creates a heterogeneity that constitutes a formidable analytical challenge as the vast structural diversity that can be generated via oligomerization and branching of glycans (complex carbohydrates) illustrates (Laine 1994). Hundreds of protein modification kinds (biological and artificial) have been reported in the Unimod (Creasy and Cottrell 2004) and RESID (Garavelli 2004) databases. The most actively studied post-translational modifications include phosphorylation, methylation, ubiquitination, methylation, acetylation, and O-GlcNAcylation (Doll and Burlingame 2015). Together, over 260,000 PTM sites have been identified in the human proteome so far (Doll and Burlingame 2015). Comprehensive information on empirically observed in vivo and in vitro post-translational modifications can be found in online bioinformatic resources including PhosphoSitePlus (PSP) (www.phosphosite.org), iPTMnet (https://research.bioinformatics.udel.edu/iptmnet/), and Phospho.ELM (http://phospho.elm.eu.org/) along with additional tools useful for PTM analysis (Hornbeck et al. 2012; Huang et al. 2018; Dinkel et al. 2011).

4.8 Phosphorylation

Protein phosphorylation is one of the central means by which cells transiently modulate protein function as exemplified by signal transduction pathways. The localization, the extent of phosphorylation, and the site-specific occupancy or stoichiometry are important determinants of protein functional modulation. Phosphorylation states are mediated by a network of kinases that phosphorylate serine, threonine, and tyrosine residues and phosphatases that remove phosphorylations. Deregulated kinase activities have been associated with the ability of cancer cells to circumvent physiological constraints on cell proliferation. Kinase inhibition (i.e., of the serine/threonine kinase mammalian target of rapamycin (mTOR)) has emerged as one of the most heavily pursued classes of drug targets in oncology (Dowling et al. 2010). With over 518 genes identified, protein kinases are one of the largest protein families in eukaryotes (Manning et al. 2002). It is estimated that a typical eukaryotic cell harbors between 700,000 and 1000,000 potential phosphorylation sites (Ubersax and Ferrell 2007; Boersema et al. 2010). Analysis of 50,000 phosphopeptides in HeLa S3 cancer cells revealed that at least three-quarters of the 11,000 identified proteins were phosphorylated (Sharma et al. 2014). Interestingly, the 150 most abundant phosphopeptides accounted for 20% of the cumulative phosphopeptide signal (Sharma et al. 2014). Phosphoproteomics analysis of nine mouse tissues (12,000 proteins; ~36,000 phosphorylation sites) revealed that most phosphoproteins are widely expressed but display tissue-specific phosphorylation to adapt to tissue function (Huttlin et al. 2010).

Phosphotyrosine accounts for only 1% of phosphorylations, owing to its primary regulatory and not structural role in proteins and a short half-life due to the presence of highly active phosphotyrosine phosphatases (Sharma et al. 2014). Many phosphoproteins such as transcription factors and protein kinases have low copy numbers. Combined with the substoichiometric levels observed for many regulatory protein phosphorylations, enrichment strategies are necessary to comprehensively profile protein phosphorylations (Macek et al. 2009). Enrichment can be performed at the phosphoprotein level prior to digestion using immobilized metal affinity chromatography (IMAC) (Collins et al. 2005) or after digestion using phosphopeptide enrichment by metal oxide affinity chromatography (e.g., using titanium dioxide (TiO2)) or IMAC. In the case of phosphotyrosine, immunoaffinity purification using phosphotyrosine-specific antibodies is preferred (Boersema et al. 2010; Kettenbach and Gerber 2011; Rush et al. 2005; Breitkopf and Asara 2012). Mass spectrometric characterization of phosphopeptides is challenging due to their overall low abundance, susceptibility to ion suppression, and limited fragmentation patterns (Dreier et al. 2018). Phosphopeptide-selective mass spectrometric detection methods include precursor ion and neutral loss scanning based on the diagnostic PO3 and H3PO4 ion losses that are caused by the lability of the O-phosphate bond in collision-induced dissociation (Le Blanc et al. 2003; Carr et al. 2005). Compared to pSer and pThr, phosphorylations of tyrosine (pTyr) are relatively stable and remain attached to MS/MS fragments, which facilitates their analysis. Also, pTyr yields characteristic immonium ions that can be used as an alternative means to identify phosphorylation sites (Steen et al. 2003). In ion trap instruments, detection of neutral losses can be used to trigger the acquisition of MS3 spectra in which the neutral loss precursor ion undergoes an additional round of isolation and fragmentation to yield better fragmentation coverage (Gruhler et al. 2005). Peptide fragmentation by ETD or ECD yields more extensive peptide backbone cleavages without shedding the labile phosphate groups first which, in turn, also facilitates phosphopeptide identification (Chi et al. 2007; Stensballe et al. 2000). Large-scale, quantitative phosphoproteomics has been used to define the downstream signaling networks of mTOR, identifying Grb10 as a potential mTORC1-regulated tumor suppressor (Hsu et al. 2011; Yu et al. 2011). The dynamic nature of the phosphoproteome mandates the acquisition of temporal profiles of the in vivo phosphoproteome to capture the cellular response upon stimulation (Olsen et al. 2006). By streamlining conventional multi-step phosphoproteomics workflows into a simplified parallel 96-well plate format protocol, sufficient sample throughput is now achievable to perform global profiling of phosphorylation in a time-resolved fashion (Humphrey et al. 2015). The NCI Clinical Proteomics Tumor Analysis Consortium (CPTAC) recently provided an optimized, highly reproducible workflow for proteome/phosphoproteome analysis that utilizes TMT-10 for multiplexed quantitation of over 10,000 proteins in a breast cancer xenograft model (Mertins et al. 2018).

An inherent challenge in large-scale phosphoproteomics analyses is the fact that changes in phosphoprotein expression levels can interfere with the interpretation of site-specific phosphorylation stoichiometries (Wu et al. 2011). Measuring the degree of phosphorylation requires the quantification of the cognate phosphorylated and non-phosphorylated peptides. This can be accomplished by splitting samples into two and forcing dephosphorylation in one fraction by phosphatase treatment and leaving the other fraction untreated. After differential stable isotope labeling, the two fractions are combined, and the degree of phosphorylation can be estimated by comparing the intensities of the differentially labeled unphosphorylated peptides (Zhang et al. 2002; Hegeman et al. 2004). Alternatively, spike-ins of synthetic isotopologues of the phosphorylated/non-phosphorylated peptides in conjunction with targeted mass spectrometry (MRM or PRM) can be used for absolute quantification of site-specific phosphorylation stoichiometry (Dekker et al. 2018; Jin et al. 2010). By normalizing for total phosphoprotein amount using multiple unmodified peptides, it is possible to estimate the degree of phosphorylation by calculating the ratios of phosphorylated/unphosphorylated peptide intensities for phosphoproteins of interest without stable isotope labeling (Steen et al. 2005). For large-scale phosphoproteomics studies that rely on phosphopeptide enrichment, parallel proteomics analyses can provide the necessary information on total phosphoprotein abundances to determine phosphorylation site stoichiometries (Wu et al. 2011; Olsen et al. 2010). Typical signaling pathway analysis is performed by collapsing discrete site measurements to the protein level. The curated PTMsigDB database aims to leverage site-specific post-translational modification information to capture signaling events more accurately as demonstrated in the phosphoproteome analysis of PI3K-inhibited breast cancer cells (Krug et al. 2018).

4.9 Ubiquitination

The ubiquitin-proteasome pathway controls the degradation of 80–90% of intracellular proteins. Ubiquitination is a process by which one or multiple ubiquitin monomers are covalently attached to the amino group at the protein N-terminus or at lysine side chains of substrate proteins, thereby forming branched proteins. Eukaryotic ubiquitin consists of 76 amino acids and is evolutionary conserved. Ubiquitination is catalyzed by a ubiquitin-activating enzyme (E1), a ubiquitin-conjugating enzyme (E2), and a ubiquitin ligase (E3), which confers substrate specificity. De-ubiquitinating enzyme can reverse the ubiquitin conjugation, creating a steady state with poly-ubiquitinated proteins (n > 4) targeted for degradation by the 26S proteasome. As an important regulator of cell proliferation, differentiation, and survival, alterations of the ubiquitin ligase pathways have been linked to cancer (Ding et al. 2014; Mani and Gelmann 2005). Characterization of ubiquitination sites by mass spectrometry is commonly performed after antibody enrichment of peptides containing the Lys-GlyGly sequence that is formed during tryptic digestions of ubiquitinated proteins (Xu et al. 2010). More recently, an immunoaffinity strategy based on the recognition of the C-terminal 13 amino acids of ubiquitin has allowed for the identification of over 63,000 unique ubiquitination sites, including N-terminal ubiquitination, across 9200 proteins in 2 human cell lines (Akimov et al. 2018).

4.10 Proteogenomics

In an effort to elucidate how somatic gene mutations impact the cancer proteome and the post-translational modification landscape, CPTAC used quantitative MS and phosphoproteomics to characterize hundreds of ovarian, breast, and colon/rectal tumors whose genome and transcriptome were previously defined by The Cancer Genome Atlas (TCGA) (Mertins et al. 2016; Zhang et al. 2014, 2016). Integrating genomic and proteomics/phosphoproteomics measurements allowed to explore the effect of copy number alterations on protein abundance and test whether transcriptome-derived subtypes are reflected in protein expression patterns. Proteogenomics promises to deepen our understanding of cancer biology and identify alterations in cancer signaling pathways and potential therapeutic targets with higher levels of confidence. The human cancer proteome variation cancer database (CanProVar) provides a bridge between genomic and proteomics data by compiling protein sequence alterations in different types of cancers (Zhang et al. 2017; Li et al. 2010) along with extensive annotation, which can be used for the detection of variant peptides in shotgun and targeted proteomics experiments (Li et al. 2011).

4.11 Ultrasensitive Proteomics via Cellular Pre-fractionation

Given the microheterogeneity of the cancer microenvironment, it can be of advantage to analyze specific cell types individually in order to more accurately reveal their biochemical potentials. Cellular populations can be specifically purified by antibody-based methods such as fluorescence-activated cell sorting, CyTOF mass cytometry, or immune magnetic separation. CyTOF mass cytometry uses rare earth metals as unique antibody reporters that are monitored by inductively coupled plasma mass spectrometry (ICP-MS) in multiplex format to reveal marker expression in individual cells (Bandura et al. 2009). ICP-MS offers an extraordinary level of sensitivity which enables the detection of metal-labeled antibodies at levels corresponding to single cells. Alternatively, cellular subpopulations can be dissected from tissue using laser capture microdissection (LCM) prior to MS-based proteomics analysis (Altelaar and Heck 2012). In-depth LC-MS analysis of approximately 3000 LCM-derived tumor cells can yield the identification of 1000–2000 proteins (Umar et al. 2007; Wiśniewski et al. 2011), a number that can be boosted to over 4000 protein identifications from microdissected cells from formalin-fixed and paraffin-embedded human tissue specimens with the incorporation of additional off-line fractionation steps (Wiśniewski et al. 2011).

4.12 Imaging Mass Spectrometry

MALDI and secondary ion mass spectrometry (SIMS) imaging mass spectrometry (IMS) combine the parallel molecular detection by mass spectrometry with microscopic imaging to visualize the spatial distribution of proteins and metabolites (Cornett et al. 2007; Schwamborn and Caprioli 2010). MALDI-IMS yields 2D molecular maps that provide the localization and relative abundance of thousands of analytes in thin tissue sections with typical pixel size in the range of 50–200 μm in an untargeted manner (McDonnell and Heeren 2007; Schober et al. 2012). The discovery nature of MALDI imaging can be complemented by imaging mass cytometry, which utilizes the multiplexing capability of CyTOF mass cytometry for the targeted multiplexed localization of up to 32 proteins with subcellular resolution. This approach was pioneered to characterize tumor cell subpopulations and highlight the heterogeneity of human breast cancer microenvironments (Giesen et al. 2014).

4.13 Outlook

The field of mass spectrometry-based proteomics continues to rapidly evolve and mature. Each new generation of mass spectrometers pushes the limits of performance in terms of resolving power, mass accuracy, and sensitivity. Many of these improvements continue to trickle down into mainstream instrumentation available to the average user. How do these technological innovations impact the field? Ultra-high resolution opens the window to investigate the fine structure of isotopologues. This advancement has already led to the development of novel stable isotope labeling strategies that take advantage of mass defect-based neutron encoding for multiplexed quantitation (Hebert et al. 2013a). The resolved isotopologues structures could also be harnessed by a next generation of informatic pipelines that capitalize on the encoded elemental composition information in an effort to improve peptide/protein identification rates.

In terms of sensitivity, one promising approach entails a switch from serial to parallel accumulation of MS precursor and subsequent release and fragmentation based on ion mobility. The speed and sensitivity of MS/MS experiments can be increased by parallel accumulation and serial fragmentation (PASEF) that is employed on trapped ion mobility-mass spectrometry (TIMS)-mass spectrometers (Meier et al. 2015). Other opportunities exist to increase sensitivity by improving and better integrating sample preparation and data acquisition workflows (Specht and Slavov 2018). Increased sensitivity will open up the transformative potential of single-cell proteomics, in which the contribution of each cell type to complex microenvironments such as cancers can be determined.

Integration with other omics approaches and resolving the spatial distribution of proteins are key aspects to reveal protein function and elucidate their role in physiology and pathology. The Human Protein Atlas project (www.proteinatlas.org) is a pioneering resource to study spatial proteomics across the major tissues and organs of the human body (Uhlen et al. 2015) and at the subcellular level (Thul et al. 2017) based on immunohistochemistry and complemented by RNA sequencing and mass spectrometry. The Human Pathology Atlas companion extends this groundbreaking system-level analysis to the transcriptome of the 17 major cancer types (Uhlén et al. 2017).

Finally, live monitoring of data acquisition will provide opportunities to fine-tune workflows in real time so that qualitative and quantitative performance can be optimized. The MaxQuant.Live framework is a first example of how real-time monitoring can be used for on-the-fly recalibration of mass and retention times which increases the efficiency of LC-MS experiments (Wichmann et al. 2018). In the future, further integration of entire workflows from automated sample preparation, data measurements, and data analysis will make the development of adaptive and smart data acquisitions a reality.