Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

Proteomics is the large scale study of the protein complement, also known as the proteome. Proteomics is studied through mass spectrometry (MS) [18]. MS can be used to investigate a large variety of chemical and biological molecules, including products of chemical synthesis or degradation, biological molecules such as proteins, nucleic acids, lipids, or glycans, or various natural compounds of either large or small molecular mass. Depending on what type of molecule is being analyzed, there are various types of MS focus, such as small-molecule MS, large-molecule MS, and biological MS (when the molecules investigated are biomolecules). Within biological MS, there are also different MS subfields, such as proteomics, lipidomics, glycomics, and metabolomics. The focus of proteomics is to analyze proteins and protein derivatives (such as glycoproteins), peptides, posttranslational modifications (PTMs) within proteins, or protein–protein interactions (PPIs).

The standard workflow in a proteomics experiment starts with sample fractionation, involving the separation of proteins prior to their analysis by MS [917]. This can be done by one or more biochemical fractionation methods. For example, a one-dimensional separation can be achieved by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE); a two-dimensional separation can be performed by two-dimensional electrophoresis or by affinity purification followed by SDS-PAGE. Biochemical fractionation is then followed by enzymatic digestion (usually trypsin), peptide extraction, and peptide fractionation by HPLC and MS analysis [1]. Data analysis leads to identification of one or more proteins and further simultaneous investigation or re-investigation of the results can extract additional information from the same MS experiment, such as PTMs and interaction partners of some proteins (PPIs) [1826]. A schematic of a proteomics workflow is shown in Fig. 1.1 and a schematic of a proteomics experiment is shown in Fig. 1.2a.

Fig. 1.1
figure 1

General proteomic experiment workflow schematic. Reprinted and adapted with permission from the Australian Journal of Chemistry CSIRO Publishing http://www.publish.csiro.au/?paper=CH13137 [15]

Fig. 1.2
figure 2

General proteomics experiment. (a) Proteomics experiment workflow schematic. (b) Proteomics and applications schematic. (c) Mass spectrometer schematic. Reprinted and adapted with permission from the Oxidative Stress: Diagnostics, Prevention, and Therapy, S. Andreescu and M. Hepel, Editors. 2011, American Chemical Society: Washington, D.C [16]

Proteomic analysis can be performed using samples from various sources such as supracellular, subcellular, intracellular, or extracellular, as well as at the peptide level (peptidomics), protein (regular proteomics), PTMs (“PTM-omics”), or protein complex level (interactomics). Proteomics can also be classified as classical or functional, when one analyzes protein samples from two different conditions (for example, normal and cancer), and targeted proteomics, when one focuses on a particular sub-proteome, such as phosphoproteomics or glycoproteomics. Proteomics can also be classified based on the protein complement from a set of samples that is being analyzed such as proteomes (i.e., all proteins) or sub-proteomes (i.e., just the nuclei or mitochondria). A schematic of such classification is shown in Fig. 1.2b.

Proteomic analysis can also focus on quality such as for protein identification, or the determination of protein amounts by quantitative proteomics. These analyses are usually performed using a mass spectrometer, the “workhorse” in a proteomics experiment. A mass spectrometer has three main components: the ionization source, a mass analyzer, and a detector (Fig. 1.2c). There are primarily two types of ionization sources on mass spectrometers: matrix-assisted laser desorption ionization (MALDI) and electrospray ionization (ESI). The mass spectrometers are consequently named MALDI mass spectrometry (MALDI-MS) and electrospray ionization mass spectrometry (ESI-MS). Here, we describe a proteomics experiment, specifically how proteins and peptides are analyzed by MS. We also describe the type of information that can be obtained from such an experiment.

1.2 Biochemical Fractionation

The first step in a proteomics experiment is biochemical fractionation, in which various proteins are separated from each other using their physicochemical properties. Biochemical fractionation usually depends on the goal of the experiment and it is perhaps the most important step in a proteomics experiment. A good sample fractionation usually leads to a good experimental outcome. A proteomics experiment can still be performed without biochemical fractionation, for example, when one analyzes the full proteome of a cell at once. However, without biochemical fractionation, the results in a proteomics experiment may not necessarily be optimal.

The physicochemical properties of proteins (or compounds of interest) that are used to achieve biochemical fractionation are, among others, molecular mass, isoelectric point, charge at various pH, and the protein’s affinity to other compounds. These properties of the proteins are well exploited by biochemical fractionations such as electrophoresis, centrifugation, and chromatography. Types of chromatography can include affinity chromatography, ion exchange chromatography, and size-exclusion chromatography.

To give one example, proteins can be separated by electrophoresis, usually SDS-PAGE, reduced and denatured, and then separated according to their molecular mass. If the reduction step is not used, the disulfide bridges in a protein or between proteins remain intact, thus providing an additional fractionation principle: two proteins with low molecular mass (such as haptoglobin subunits) are kept together through disulfide bridges and are separated under SDS-PAGE under nonreducing conditions as a heterotetramer with a high molecular mass. In a different variant of SDS-PAGE, but not using the detergent (SDS), one may separate proteins under native conditions. Therefore, simply by adding one reagent (for example, SDS) or two (SDS and a reducing agent like dithiothreitol or DTT), separation of these proteins may have a totally different outcome. A variant of SDS-PAGE is tricine-PAGE [27, 28], which has a principle of separation similar to the SDS-PAGE, but it has the highest separating resolution in the low molecular weight (Mw) proteins and peptides (2–20 kDa), where SDS-PAGE has poor or very poor resolution. Therefore, SDS-PAGE and tricine-PAGE complement each other.

Other types of electrophoresis are blue native PAGE (BN-PAGE), colorless native PAGE (CN-PAGE), and detergent-less SDS-PAGE (native PAGE) [1, 4, 6, 1822, 2934], all native electrophoresis. BN-PAGE separates protein complexes by using the external charge induced by Coomassie dye; thus, the complexes will have the same charge and will separate according to their molecular weight. If the Coomassie dye is not used, the external charge is not induced and the separation does not take place according to the molecular weight of the complexes, but rather according to the internal charge of the protein complexes. This method, a variant on BN-PAGE, is named CN-PAGE. CN-PAGE is particularly useful when two protein complexes with identical mass must be separated from each other.

In addition to the techniques mentioned for biochemical fractionation, hyphenated techniques may also be used. The classical example is two-dimensional electrophoresis (2D-PAGE), which includes separation of proteins by isoelectric focusing and by SDS-PAGE [3, 7, 3545], still used in some proteomics labs. In fact, a variant of 2D-PAGE is differential gel electrophoresis (DIGE), a powerful method for gel-based proteomics. Other fractionation methods such as pre-coated chips, centrifugal filters, and magnetic beads are also possible [46, 47].

1.3 Mass Spectrometry

A mass spectrometer has three main parts: an ion source, a mass analyzer, and a detector. Initially, the sample is ionized and the ions produced by MALDI or ESI source are separated in the mass analyzer based on their mass-to-charge (m/z) ratio. The ions are then detected by the detector. The end product is a mass spectrum, which is a plot of ion abundance versus m/z.

Ionization sources. Ionization of peptides is dependent on the electrical potential at the ion source and on the pH at which they are analyzed. At low pH, the peptides are protonated through the amino-containing amino acids such as Arg or Lys, while at high pH, the peptides are de-protonated through the carboxyl-containing amino acids such as Asp or Glu. When the electrical potential at the ion source is positive, ionization is in positive ion mode. Conversely, when the electrical potential is negative, ionization is in negative ion mode. Therefore, there are two types of ionization: positive, when peptides are analyzed at low pH and the Arg, Lys, and His are protonated, and negative ionization, when peptides are analyzed at high pH and the Asp and Glu are de-protonated. In the current chapter, we will focus only on positive ionization, because it is one of the most used ionization modes for analyzing peptides and proteins. In addition, the enzyme that is the most widely used in proteomics is trypsin which cleaves conveniently at the C-terminus of Arg and Lys and produces peptides that are, upon ionization, at least doubly charged (the peptide and the C-terminal amino acid) and produces a y product ion series upon collision-induced fragmentation (described later).

In addition to ESI and MALDI, there are several additional ionization methods, such as chemical ionization (CI), electron ionization (EI), or atmospheric pressure chemical ionization (APCI) [48, 49]. EI is used for analysis of organic compounds and can be used for all volatile compounds with a mass smaller than 1,000 Da. EI provides good structural information derived from fragmentation. However, molecular mass determination is rather poor (poor signal or the absence of M+ ions) [50]. Chemical ionization is the opposite: it is very good for the determination of the molecular mass of molecules, but it is not very good in providing structural information due to reduced fragmentation in comparison to EI. Therefore CI and EI could complement each other. In CI experiments, ionized species are formed when the gaseous molecules to be analyzed collide with primary ions present in the source under a high vacuum [51]. A variant of CI is negative CI used only for volatile analytes with a mass of less than 1,000 Da [52, 53]. Another ionization technique, APCI, is an alternative for analysis of compounds that do not ionize in ESI. During APCI, generally only singly charged ions are formed and it is usually applied to compounds with a molecular weight of less than 1,500 Da [54].

Mass analyzers. There are three main types of mass analyzers used for proteomics experiments: trapping type instruments (quadrupole ion trap—QIT, linear ion trap—LIT, Fourier transform ion cyclotron resonance—FT-ICR, and Orbitrap), quadrupole (Q), and time of flight (TOF) instruments.

Trapping type instruments first accumulate ions and then allow for mass measurement. The ion trap analyzers first capture ions in three-dimensional space (trap), and then electrostatic gate pulses to inject ions into the ion trap. The ion trap-based analyzers are relatively inexpensive, sensitive, and robust. They have been extensively used in proteomic analysis. However, a problem with these instruments is their accuracy for both precursor and product ions, partially overcome by an FT-ICR. Unfortunately, this instrument is not very often used in proteomics research because peptides do not fragment well and the instrument is expensive [55, 56].

In quadrupole mass analyzers, ions constantly enter the analyzers, which are separated based on their trajectory in the electric field applied to two pairs of charged cylindrical rods. There is an electric potential between each pair of rods drawing the ions towards one rod. These instruments provide good reproducibility and low cost, but their resolution and accuracy are limited [49, 57].

Instruments with TOF mass analyzers are popular for sample analysis in proteomics due to their high resolution and relatively low cost, speed of measurements, and high mass accuracy [49, 57]. In TOF mass analyzers, ions are accelerated by a known electric field and then travel from the ion source to the detector. The instrument measures the time it takes for ions with different masses to travel from the ion source to detector

Mass spectrometers can have stand-alone analyzers or in combination, usually two or three analyzers within one instrument, thus taking advantage of the strength of all combined analyzers simultaneously. Examples of such instruments are Q-Trap, QQQ, Q-TOF, TOF–TOF, QQ-LIT; these instruments are also called hybrid mass spectrometers, and are highly sensitive and also have a high resolution [1, 5759].

MS detectors. The MS detectors are usually electron multipliers, photodiode arrays, microchannel plates, or image current detectors.

1.4 MALDI-TOF MS

MALDI-TOF MS or MALDI-MS (Fig. 1.3a) is mostly used for determination of the mass of a peptide or protein and for identification of a protein using peptide mass fingerprinting. In MALDI-MS, the peptide mixture is co-crystallized under acidic conditions with a UV-absorbing matrix (for example, dihydrobenzoic acid, sinapinic acid, alpha-hydroxycinnamic acid) and spotted on a plate. A laser beam (usually nitrogen; 337 nm) then ionizes the matrix and peptides, which desorb and start to fly under an electrical field. The matrix molecules transfer a proton to peptides, which then become ionized, fly through the TOF tube, and are detected in the detector as a mass spectrum. Charged peptides fly through the mass analyzer as ions according to their mass-to-charge ratio (m/z) and to the formula: [M + zH]/z, where M is the mass of the peptide and z is the charge of the peptide; H is the mass of hydrogen (1.007825035 atomic mass units). In MALDI-MS analysis, the charge of peptides is almost always +1 and the peptides are mostly observed as singly charged; the formula is then [M + 1 × 1]/1 or [M + 1]/1 or [M + 1]. Therefore, the peptides are mostly detected as singly charged peaks or [MH]+ peaks (Fig. 1.3b).

Fig. 1.3
figure 3

MALDI-TOF MS. (a) MALDI-TOF mass spectrometer principle. An ion source, a mass analyzer, and detector are present on the instrument. At the detector the mass spectrum is detected/recorded. The mass analyzer is a TOF and can be used in linear mode or reflective mode. (b) A MALDI-MS spectrum primarily contains singly charged peaks; one example is shown (enlarged) to reveal the peak’s charged state (single charged or +1). (c) Protein identification via MALDI-MS and peptide mass fingerprinting (PMF). A protein is digested into peptides using trypsin and the Fig. 1.3 (continued) peptide mixture is analyzed by MALDI-MS and a spectrum is collected. A similar experiment is performed in silico (a theoretical experiment in computer), but the cleavage is performed in all proteins from a database. During the database search, the best match between the theoretical and the experimental spectra then lead to identification of a protein. Reprinted and adapted with permission from the Oxidative Stress: Diagnostics, Prevention, and Therapy, S. Andreescu and M. Hepel, Editors. 2011, American Chemical Society: Washington, D.C [16]

In the MALDI-MS mass spectrum, one peak corresponds to one peptide and many peaks correspond to many peptides, either from one protein or from more proteins. Database search of the MALDI-MS spectra usually identifies that single protein or those proteins through a process named peptide mass fingerprinting (Fig. 1.3c).

1.5 ESI-MS

In contrast to MALDI-MS, in which peptides are ionized with the help of a matrix (and are in the solid phase), in ESI-MS (Fig. 1.4a) peptides are ionized in the liquid phase, under high electrical current. Also, while in MALDI-MS peptides are mostly singly charged, in ESI-MS peptides are mostly double or multiple charged. Regarding the ionization method, peptides fly as ions according to m/z and calculation of the molecular mass of the peptide is performed according to the same [M + z]/z formula, where z is again the charge (z is 2 for doubly charged peptides, 3 for triply charged peptides, etc.).

Fig. 1.4
figure 4

ESI-MS of peptides. (a) An ESI-MS mass spectrometer. The ESI-MS has an ion source, in which the ions are ionized, a mass analyzer that ions travel through, as well as an ion detector, which records the mass spectrum. In ESI-MS, the sample is liquid, under high temperature and high electric current. The sample dehydrates and becomes protonated for positive ionization. (b) TOF MS spectra example, in which two different peaks, one triply charged peak with m/z of 736.81 (left) and one double charged with m/z of 785.81 (right, both circled and zoomed in), are selected for fragmentation and produce the MS/MS spectra whose data analysis led to identification of peptides with the amino acid sequence RESQGTRVGQALSFCKGTA (left) and EGVNDNEEGFFSAR (right). Note that when the protonation site (R) is on the N-terminus of the peptide, the quality of the MS/MS spectrum is not great and analysis of the b and y ions produced by the MS/MS fragmentation is difficult to interpret. However, when the protonation site is on the C-terminus of the peptide, the fragmentation produces a nice y ion series and the analysis of these ions can easily identify the amino acid sequence of the peptide. Reprinted and adapted with permission from the Oxidative Stress: Diagnostics, Prevention, and Therapy, S. Andreescu and M. Hepel, Editors. 2011, American Chemical Society: Washington, D.C [16]

When a peptide mixture is injected into the mass spectrometer, all or most peptides that ionize under the experimental conditions are detected as ions in an MS spectrum in a process called direct infusion (ESI-MS mode). For example, if one has 10 peptides in an Eppendorf tube, one can identify all 10 peptides in one spectrum. However, in the MS one identifies only the masses of the peptides. In order to identify the sequence information about one particular peptide, one must isolate one peak that corresponds to one of the 10 peptides (precursor ion), fragment it in the collision cell using a neutral gas (for example, Argon gas), and record a spectrum (a sum of spectra) of the product ions that resulted from fragmentation of the precursor ion called MS/MS (ESI-MS/MS mode). Data analysis of the MS and MS/MS spectra usually leads to identification of the mass and sequence information about the peptide of interest. Examples of ESI-MS and ESI-MS/MS spectra are shown in Fig. 1.4b. As observed, the quality of the MS/MS spectra is directly dependent on the amino acid sequence, but more important, by the position of the proton-trapping amino acid (R, H, or K, in this case, R). For example, if the proton-trapping amino acid is on the N-terminus, low intensity b and y ions are observed (Fig. 1.4b, left). However, when the proton-trapping amino acid is located on the C-terminus, the fragments produced are almost always y ions of high quality. This is also the main reason for which most proteomics experiments use trypsin as an enzyme, since it cleaves the C-termini of R and K and produces peptides with an R or a K at the C-terminus.

Sometimes, when a peptide has more than one proton accepting amino acid such as Arg or Lys, the peptide may be protonated by more than two or three protons. Therefore, the same peptide may be identified with more than two or three charges. The advantage for these peptides is that if the precursor ion in a charge state of, e.g., 2+ does not fragment well in MS/MS, then the peak that corresponds to the same peptide but in a different charge state (e.g., 3+ or 4+) may fragment very well. One drawback for the multiply charged peptides is that they are usually longer (2,500–3,000 Da) than the regular peptides analyzed by MS (800–2,500 Da) and data analysis for these peptides may be more difficult than for regular peptides. However, overall, fragmentation of more than one peak corresponding to the same peptide but with different charge states may help in obtaining additional information about that peptide.

ESI-MS can be used not only for peptides but also for investigation of proteins and the information is particularly useful for determining the molecular mass of those proteins, of their potential PTMs, and of their conformation. In addition, the high molecular mass proteins can also be analyzed by ESI-MS in either positive mode (protonated) or negative mode (de-protonated), thus providing distinct, yet complementary, information regarding the distribution of charges on the surface of the protein investigated. Examples of MS spectra of a 16.9 kDa protein investigated by ESI-MS in both positive and negative mode are shown in Fig. 1.5.

Fig. 1.5
figure 5

ESI-MS proteins: ESI-MS spectra of intact 17 kDa protein, myoglobin, analyzed under acidic conditions (pH ~ 2). (a) MS spectrum in positive ionization; (b) MS spectrum analyzed in negative ionization. The positive (A) and negative (−) charges are indicated. The peak with m/z of 616.32 (1+) corresponds to the heme group, which is the prosthetic group of myoglobin. Reprinted and adapted with permission from the Australian Journal of Chemistry CSIRO Publishing http://www.publish.csiro.au/?paper=CH13137 [15]

1.6 LC–MS/MS

Analysis of peptide mixtures by ESI-MS for determination of the molecular mass of the peptides is usually a quick procedure. However if one wants to investigate the sequence information of more than one peptide, it is not the method of choice, since fragmentation of the ions that correspond to peptides happens manually; one peptide at the time. For example, if one has 4 peptides in a mixture, we can determine the molecular mass of all peptides in minutes, but to determine their amino acid sequence, the peptides must be selected for fragmentation one at the time. Therefore, to automate this process, an alternative approach is necessary. One option is to fractionate the peptides by column chromatography coupled to an HPLC, i.e., reversed phase-based HPLC (reversed phase columns are particularly compatible with MS). The combination of HPLC and ESI-MS is named HPLC–ESI-MS or LC–MS. In this setting, the peptides are fractionated by HPLC prior to MS analysis. They can also be selected for fragmentation and then fragmented by MS/MS. In a process called data-dependent analysis (DDA), usually 3–4 precursor peaks (which correspond to peptides) are selected for fragmentation from one MS scan and fragmented by MS/MS in a process called LC–MS/MS. In LC–MS/MS, the mass spectrometer analyzes fewer peptides per unit of time as compared with ESI-MS, simply because the HPLC fractionates the peptide mixture over a longer period of time (such as a 60 min gradient) and gives the mass spectrometer more time to analyze more peptides. A schematic of the LC–MS/MS is shown in Fig. 1.6a.

Fig. 1.6
figure 6

LC–MS/MS experiment. (a) In each LC–MS/MS experiment, with elution of peptides from the HPLC gradually, the mass spectrometer analyzes corresponding ions via MS survey (recorded in an MS spectrum). Ions with highest intensity (typically 1–8 ions; two ions in this example) are selected for MS/MS fragmentation, fragmented, and then recorded as MS/MS #1 and MS/MS #2. Fig. 1.6 (continued) The mass spectrometer returns to the MS function at that point, recording an MS spectrum (MS survey). Once again ions with highest intensity are selected for fragmentation, fragmented, and recorded as MS/MS spectra. (b) An example of an LC–MS/MS experiment in which total ion current is recorded and at a specified time, an MS survey is recorded and one peak corresponding to a peptide (m/z of 582.56, doubly charged) is selected, and then fragmented in MS/MS. The fragmentation pattern (primarily b and y ions) from MS/MS provides sequence information regarding the peptide, leading to identification via database search. In this example, the peptide identified had the sequence VSFELFADK, identified as a component of human cyclophilin A. Reprinted and adapted with permission from the Oxidative Stress: Diagnostics, Prevention, and Therapy, S. Andreescu and M. Hepel, Editors. 2011, American Chemical Society: Washington, D.C [16]

Various types of improvements can be done to increase the number of MS/MS spectra with high quality data which can lead to identification of additional proteins. One is at the flow rate of the HPLC. On a high flow rate, the mass spectrometer will have less time to analyze the peptide mixtures, as compared with lower flow rate. On a longer HPLC gradient (such as 120 min), the mass spectrometer will have more time to analyze more peptides, as compared with a shorted gradient. The number of MS/MS may also influence the number of peptides fragmented per minute. For example, a mass spectrometer has usually one MS survey followed by several MS/MS, for example, between 3 and 10 channels for MS/MS (newer instruments can be up to 30 MS/MS). If the method is set to have one MS survey scan and then to do MS/MS of the two most intense peaks, then the instrument will work as follows: one second MS survey, one second MS/MS (Peak 1), one second MS/MS (Peak 2), and then again one second MS survey (Fig. 1.6a).

Assuming that a mass spectrometer has a cycle of one MS and two MS/MS (such as 0.1 s for an MS survey followed by selection of two precursor peaks for fragmentation by MS/MS; 3 s per MS/MS), this means that in 1 min, the MS instrument can perform ~30 MS/MS that can lead to identification of ~15 proteins. In a 120 min gradient, the possible number of proteins that can be identified is ~15 × 120 = 1,800 proteins, but keeping in mind that the real length of a 120 min gradient is about 90 min (the rest of 30 min in washing with organic), this means that an MS run can identify ~15 × 90 = 1,350 proteins. If the length of an MS/MS decreases from 3 to 1 s and the number of precursors selected within MS survey for MS/MS increases to 6, then the number of proteins identified increases by sixfold (~1,350 × 6 = 8,100 proteins). Assuming that these results are at a flow rate of 0.5 μL/min, if we reduce the flow rate by ½, the number of proteins that can be identified increases by a factor of 2 (i.e., 8,100 × 2 = 16,200).

However, when we calculate the number of these proteins that can be identified, our assumption is that all the steps mentioned work perfectly. In practice, this is often not the case. For example, the type and length of the gradient in HPLC (for example, sharp or shallow) do play an important role in peptide fractionation. An optimized versus a non-optimized nanospray will always play a role in the outcome of the proteomics experiment and the number of proteins identified. Obtaining a nanospray is just not good enough; “getting a good nanospray” is crucial to the success of a proteomics experiment. These and other known and/or unknown factors (not described here) that may influence the protein identification do indeed decrease the number of proteins identified in a proteomics experiment and in practice, a good LC–MS/MS run usually leads to identification of about 500–1,000 proteins. An example of a total ion current/chromatogram (TIC), MS, and MS/MS is shown in Fig. 1.6b.

1.7 Data Analysis

The raw data collected by a mass spectrometer are usually processed with software (for example, Protein Lynx Global Server, PLGS from Waters Corporation) and the output data (i.e., a peak list) is used for database search. There are many database search engines such as Sequest, X!Tandem, Mascot, or Phenyx. The results from the database search (such as from PLGS processing or Mascot search) can also be imported into a third-party software such as Scaffold (proteomesoftware.com) and further analyzed for protein modification, quantitation, and other factors.

MS may be not only qualitative but also quantitative and methods such as DIGE [60], isotope-coded affinity tag (ICAT) [5], stable isotope labeling by amino acids in cell culture (SILAC) [61], absolute quantitation (AQUA) [62], multiple reaction monitoring (MRM) [63], or spectral counting [64] have been successfully used in detection, identification, and quantification of proteins or peptides.

1.8 Protein Identification and Characterization

Determination of the molecular mass and amino acid sequence is the first step in protein identification. Once the protein is identified, then it is characterized. There are two methods for protein characterization using MS: a top-down approach when intact proteins are investigated and bottom-up approach when proteins are digested and the peptide mixture is analyzed (Fig. 1.7).

Fig. 1.7
figure 7

Schematic workflow for bottom-up and top-down MS-based protein characterization and identification. Reprinted and adapted with permission from the Australian Journal of Chemistry CSIRO Publishing http://www.publish.csiro.au/?paper=CH13137 [15]

A top-down approach allows for the identification of protein isoforms or any potential PTMs within proteins [65]. In bottom-up approach, digested proteins are subjected to MS analysis using on-line tandem mass spectrometry (MS/MS). In the same bottom-up approach, peptide mass fingerprinting for protein identification is also used, particularly in MALDI-MS analyses.

In a variation of bottom-up proteomics, known as shotgun proteomics, a large protein mixture is digested, and the resulting peptides are fractionated by one-dimensional or multidimensional chromatography and further analyzed by MS/MS [66]. For maximum protein identification and characterization, a combination of bottom-up and top-down proteomics is/can be used [67, 68].

Characterization of proteins is not easy, but it becomes even more complicated due to the intensive PTMs of proteins. It is very difficult to fully identify PTMs at a particular time point in cells, tissues, or organisms and to derive a meaningful interpretation and biological significance from these identified PTMs. So far, the only method that is appropriate for large scale identification of PTMs is MS-based proteomics [69]. PTMs are time- and site-specific events and are important to all biological processes. However, for a meaningful characterization, special enrichment strategies must be used. These strategies are able to characterize most stable modifications in proteins which include glycosylation, phosphorylation, disulfide bridges, acetylation, ubiquitination, and methylation. MS approaches for identification and characterization of proteins and PTMs are shown in Fig. 1.8.

Fig. 1.8
figure 8

MS-based characterization of protein PTMs (glycosylation and phosphorylation), general strategies. Reprinted and adapted with permission from the Australian Journal of Chemistry CSIRO Publishing http://www.publish.csiro.au/?paper=CH13137 [15]

Two common PTMs in proteins are glycosylation and phosphorylation. Glycosylation is commonly found in extracellular proteins or in the proteins that form the extracellular side and are responsible for biological processes such as cell–cell communication or ligand–lectin interaction [70, 71]. In the pharmaceutical and biotechnology industry that focus on biotherapeutics, glycosylation is a critical modification of recombinant proteins, which influences their stability and solubility [72, 73]. Therefore, characterization of glycoproteins is difficult because the glycosylation is not uniform and usually more glycoforms are simultaneously produced by the cells and the accuracy in the MS-based identification and characterization of the glycoprotein isoforms is crucial [74].

Analysis of glycoproteins may be accomplished by LC–MS/MS analysis of tryptic digests. This method allows for identification of saccharide diagnostic fragments (i.e., hexoses), but its detection efficiency for glycoproteins is rather poor [7577]. A better strategy involves glycoprotein enrichment by affinity chromatography (lectins), which facilitates its identification in subsequent LC–MS/MS analysis [78]. Another strategy involves the release of the glycans from glycopeptides, followed by targeted analysis of the glycans. N-linked glycans can be digested using peptide-N-glycosidase F (PNGase F), while O-linked glycans can be released by β-elimination. Change in mass units for peptides upon glycan removal allows for identification of the types of glycosylation (N- or O-linked), as well as the sites of glycosylation. For N-linked glycans, PNGase F treatment leads to an asparagine-to-aspartate conversion, with a net increase of 1 mass unit [14].

For O-glycans, conversion of serine to alanine and of threonine to aminobutyric acid results in a net loss of 16 mass units [79]. While the currently developed methods allow for fast and reliable identification of glycosylation sites, their characterization is still a great challenge, mostly due to the presence of glycan positional isomers [74].

Phosphorylation is a common and reversible PTM that plays a role in modulating many cellular processes [80]. Abnormal phosphorylation in various proteins and the phosphorylation patterns in the proteomes have been connected to various diseases [81, 82]. Therefore identification of protein phosphorylation will allow us to understand many physiological processes such as the phosphorylation-based signal transduction pathways and hopefully may lead to the discovery of new therapeutic targets [8385].

Identification and characterization of phosphorylation on peptides are usually accomplished by MS and scanning for neutral loss of HPO3 (80 mass units) from phosphotyrosine and H3PO4 (98 mass units) from phosphoserine and phosphothreonine residues [86, 87] usually allows the identification of phosphopeptides and the amino acid that is phosphorylated. Complete methylation of peptide sample followed by MALDI-MS analysis in both positive and negative ionization modes was also successfully employed [88]. However, since phosphorylation is a transient event and phosphorylation–dephosphorylation events may have opposite biological effects, data verification, data validation, and data interpretation may be difficult. Therefore, enrichment of phosphopeptides using TiO2, metal-oxide-based resins (MOAC), a combination of TiO2 and IMAC (TiMAC), and antibody affinity purifications [89, 90] is advised.

Another important protein PTM is disulfide bridges [11, 12, 15, 91], formed through the oxidation of cysteine residues, with an important role in maintaining the three-dimensional conformation of proteins and inherently their physiological function. Disulfide bridges are usually found in extracellular and membrane-bound proteins, both as homodimers and homopolymers, but also as heterodimers and heteropolymers. Correct disulfide bridge formation is essential for proteins in adopting their optimal three-dimensional structure and assignment of the disulfide connectivities allows researchers to understand the structure and function of these proteins under physiological conditions and to predict problems in normal functioning of proteins when the disulfide bridges are scrambled or misconnected [92, 93].

Assignment of disulfide bridges in proteins may be accomplished by many approaches. For example, separation of disulfide-linked proteins or peptides by SDS-PAGE or tricine-PAGE under nonreducing and reducing conditions, followed by Coomassie staining and MS analysis, is one option. This MS analysis involves digestion of reduced and non-reduced aliquots of the same peptide mixture, followed by comparison of the masses of peptides that contain one cysteine, versus the disulfide-linked peptides in their oxidized form using MALDI-MS or ESI-MS [94, 95]. This task (assignment of disulfide bridges) is difficult when only one protein is analyzed and it is even more difficult when there are more cysteine residues per protein. In addition, there is no particular approach that allows one to identify disulfide bridges on a large scale. While mass spectrometry is capable of simultaneously analyzing many disulfide bridges, there are no bioinformatics means to interpret the MS data. Here we discussed only three types of PTMs, but there are many additional PTMs with biological significance and a similar number of challenges that are yet to be solved for each PTM and for automation of high-throughput identification and characterization of PTMs. However, it is clear to us that MS-based proteomics is perhaps the best and only option to accomplish both identification and characterization of PTMs.

1.9 Mass Spectrometry-Based Peptide and Protein Profiling and Quantitation

In addition to qualitative proteomics, another dimension in a proteomics experiment is quantitative analysis of proteins from the samples analyzed. Changes in protein expression between different physiological states or between physiological and pathological ones are common and qualitative proteomics without quantitative interrogation is at its best, partial proteomics. Therefore adding an extra dimension (quantitative) to MS-based proteomics expands its capabilities and advantages [96].

There are many workflows that have been developed and optimized to interrogate two or more proteomes or particular proteins from these proteomes using quantitative analysis, some of which are depicted in Fig. 1.9. Traditional quantitative analysis compares two sets of proteomes or a proteome from two different physiological states or physiological and pathological states and is a gel-based protein profiling technology which employs 2D-PAGE [97]. The protein spots that have different intensities are usually excised, digested, and analyzed by MS, usually accomplished by instruments capable of MS/MS fragmentation (i.e., triple quadruple or ion trap mass spectrometers), simply because many compounds have similar masses and it may be difficult to monitor them in complex matrices. In addition, combination of MS and MS/MS allows one to use a combination of precursor ion for MS and fragment ions for MS/MS, thus providing a more selective monitoring of peptide/protein quantity [98100].

Fig. 1.9
figure 9

MS-based protein quantification workflow strategies via stable isotope labeling. Reprinted and adapted with permission from the Australian Journal of Chemistry CSIRO Publishing http://www.publish.csiro.au/?paper=CH13137 [15]

Protein quantitation can be made using label-based or label-free techniques. Label-free methods are often used in many proteomic measurements because they are simple and the cost is low [99]. Proteins also do not require special handling, such as tag or isotope labeling. With current advancements in software technology, there is no limit to the number of samples that can be analyzed.

Among the approaches used for label-free protein quantification are spectral counting and measurement of MS precursor ion intensity (or chromatographic peak area) [101, 102]. In spectral counting, one measures the number of spectra that correspond with peptides that are part of one protein [64, 103], while MS precursor ion intensity approach interrogates the chromatographic peaks corresponding to particular peptides at a normalized elution time. The protein or peptide quantity is then calculated using a standard curve or area under the curve as compared with another sample. All these label-free methods, although fast and cheap, also have disadvantages: they depend on analytical and biological reproducibility and any variation in sample preparation or sample analysis can lead to technical and instrumental errors, well reflected in the quantitation outcome [104, 105].

In addition to label-free quantitation, label-based quantitation strategies have emerged that use stable isotopes (13C, 15N, 18O, or 2H) [106], in which native and labeled samples are combined and analyzed simultaneously. This isotope-based quantitation is also called absolute quantitation, as opposed to label-free, relative quantitation.

Using the absolute quantification method, synthetic peptides or proteins are used, which are labeled with stable isotopes on one or more amino acids [6, 107]. The peptides are used as internal labeled standards, which are added directly to the samples to be analyzed. Due to the difference in the isotope pattern (and the mass difference) during MS analysis, quantitation can be performed [108110]. One label-based quantitation method is stable isotope labeling of select amino acids (usually arginine or lysine) in cell culture (SILAC), used for metabolic labeling [61]. This method can be used in many applications such as investigation of signaling pathways [8, 22, 111119], but it is mostly restricted to cell culture and it cannot be used to investigate biological fluids (i.e., blood, urine, saliva) [120]. However, recently, Matthias Mann’s group created labeled mice. Animals were fed with a 13C arginine- and 13C lysine-infused diet and then can be used to investigate biological fluids [121, 122]. Another method for stable isotope labeling is the incorporation of labeled/modified tags on specific amino acids. In one such method, cysteine residues may be labeled using an ICAT, in which two conditions (i.e., two proteomes) are investigated [5]. For analysis of more than two conditions, such as for time-course experiments, or for three or more different biological samples, labeling strategies using isobaric tags for relative and absolute quantification (iTRAQ) and tandem mass tags (TMT) have been successfully developed [123125]. Although these methods have improved quantification capabilities, there are still limitations such as lack of reproducibility between individual analyses. Some of these issues are addressed through targeted quantification using approaches such as selected reaction monitoring (SRM) [126] and MRM [127], which have shown excellent reproducibility when used with stable isotope-labeled internal standards [128, 129].

With fast advancement in current proteomic methodologies and technologies, protein quantification and profiling will become a standard for use in clinical diagnostic laboratories. However, before large scale proteomics experiments can be performed for clinical use, the reproducibility of large scale quantitation must be addressed and new, improved quantitation platforms developed and tested. Nevertheless, these new technologies indeed have the potential to be easily integrated in the set of tools that can perform disease- or disorder-specific protein profiling and that future is close to becoming a reality.

1.10 Identification of Protein–Protein Interactions (PPIs) Using MS

The molecules within a cell are not static, but rather dynamic. They form various types of interactions. These interactions can be static, as in protein complexes, or dynamic, as in transient protein interactions such as hormone–receptor interactions or substrate–enzyme interactions. All of these interactions within a cell form the interactomics network or interactome, and the proteins are a major component [130], modifying and controlling their own or other proteins’ functions [131]. The dysregulation of these interactions, particularly PPIs, usually leads to a pathological state such as diseases or disorders and their investigation is essential to the current efforts to understand these diseases or disorders.

There are many methods for identifying PPIs such as size-exclusion chromatography (SEC) [132134], sucrose gradient ultracentrifugation [135, 136], the yeast two-hybrid system (Y2H), or affinity purification MS (AP-MS) [3, 130, 137140]. These methods allow identification of stable PPIs (i.e., by sucrose gradient, SEC, or AP-MS), of binary interactions (Y2H), or of transient PPIs (AP-MS). However, these current methods have limitations. Sucrose gradient ultracentrifugation and SEC are time-consuming and not suitable for automation, while Y2H and AP-MS can be automated, but have high rate of false positive identifications of PPIs [141143]. Therefore, efforts are being made to reduce these limitations.

Native gel electrophoresis (clear native PAGE or CN-PAGE and BN-PAGE) are an alternative option and separate protein complexes according to their molecular mass (BN-PAGE) or according to their internal charge and independent of their mass (CN-PAGE) [21, 144]. The advantage of these methods is that they can separate all protein complexes from the whole proteome in one single experiment and can be combined with MS to identify protein complexes [145, 146]. However, the problem with these methods is that the gels are usually “home-made,” are not always reproducible, and require extensive work and bioinformatics expertise.

As an alternative option, one may also use ESI-MS for direct measurement of stable and transient PPIs in a solution [147, 148]. However, we still do not have capability to fully comprehend and the means to fully investigate the PPIs, in particular transient PPIs and even more complicated transient PPIs that have transient or reversible PTMs. This field will perhaps be called something like PTM-ed-PPI-omics.

1.11 Recent Advances

Currently we have the means to create disease animal models or control animal models (such as the SILAC mouse for absolute quantitation) [149, 150]. Capabilities of mass spectrometers have recently increased and currently MS-based technology allows us to identify thousands of proteins. Current machines are rich in additional technology that allows one to identify not only proteins and their PTMs and PPIs but also their shape and configuration. Such instruments are commercially available [151], thus filling the need for analysis of proteins that are not easy to investigate using classical approaches such as X-ray and NMR [152]. Therefore, these and other methods not listed here not only provide a solution for analysis of challenging proteins but have also opened the doors to new fields such as structural proteomics [153, 154].

1.12 Challenges and Perspectives

Many genomes have been sequenced. Many proteins from various sources have been identified. However, it will be an enormous mistake to state that we have identified a full proteome of a whole cell. This milestone has not yet been achieved. We have identified many or most of proteins in specific cells, such as bacteria, expressed at a particular time point, under particular growing conditions. However, while we are close to identification of most proteins in a cell, we are far from identification of the full proteome, including all proteins, isoproteins, modified proteins (PTMs), and truncated proteins. After the sequencing of the human genome, humans realized that our genome contains only ~30,000 genes, but encodes for about 100,000 unique protein sequences [155, 156]. Adding the truncated proteins, splice isoforms, and mutated proteins, PTMs will give us a number of, between 1 and 2,000,000, proteins, many of them expressed either transiently or in a very low concentration, thus making the cell’s proteome complexity more difficult to analyze and interpret [157]. To partially overcome some of these challenges, more and more advanced MS-based technologies can be combined with many fractionation, separation, and identification methods within one experiment, such as combining immunoaffinity with gel electrophoresis, liquid chromatography (LC), and MS, to increase sensitivity and dynamic range. Furthermore, there is plenty of room for optimization of MS-based methods to be successfully used in high-throughput analysis.

Therapeutic proteins are frequently membrane proteins [158161]. However, the membrane proteomics (membranomics) is the most difficult task that can be achieved in proteomics. In addition, transmembrane proteins are the most modified proteins by PTMs and are simply very difficult to investigate by MS, although some progress has been made in this direction [162, 163].

1.13 Conclusions

Despite the many challenges that the MS and proteomics fields face, the impact of MS is stronger and stronger, year after year. The number of unknown proteins decreases and protein databases become more comprehensive over time. With new MS technology and with combinatorial approaches towards the simultaneous identification of proteins, isoproteins, and truncated PTMs, PPIs will hopefully allow us to completely characterize proteomes at both a qualitative and quantitative level.