Translational proteomics: the importance of mass spectrometry-based approaches

Interest in using mass spectrometry (MS) for clinical analyses has grown significantly in the past few years due to its success in studies of human specimens, such as its recent applications for single cell analysis of bone marrow [1], direct blood sampling in multiple disease states, such as cardiac injury [2] and breast cancer [3], and the identification of Gram-negative bacilli in multiple clinical samples, including blood, tissue and urine [4, 5]. MS analyses are utilized to obtain highly accurate mass measurements of molecules in a sample, and can sensitively detect and identify molecules and subtle changes in their composition and abundance. In particular, MS-based proteomic applications have received considerable attention. Proteomics is the study of the entire complement of proteins in an organism, tissue or cell and their changes under different conditions, from disease states to environmental variations. It has been estimated that the human proteome contains more than 2 million different protein products or 'proteoforms' [68].

Since human proteins perform cellular functions essential to health and/or disease, obtaining knowledge of their presence and variance is of great importance in understanding disease states and for advancing translational studies, especially those related to personalized medicine [9, 10]. Human blood contains combinations of potentially detectable proteins from different parts of the body, and may be the single most informative sample for characterizing an individual's health [11]. From a clinical perspective, finding specific disease markers or biomarkers in such fluids represents an attractive alternative to tissue samples, due to the relative ease and less invasive nature of collection, and the large volumes that are normally obtainable. Proteomic studies promise to provide insights into the dynamic nature of biological systems through analysis of the proteins in biofluid and tissue samples, thereby defining the state of the organism at the molecular level. This approach not only incorporates the complexity of gene expression, but importantly also allows characterization of proteoforms generated by post-translational processes. Proteome measurements therefore have great potential for translational application, since both normal and altered cellular functions of the human body are ultimately dependent on the expression and regulation of proteins. Moreover, disruptions in protein expression are likely to serve as early indicators of disease (that is, biomarkers) or targets for drug development and therapeutic intervention. These promising clinical applications have driven the development of MS-based approaches for proteomics, as well as other -omic level analyses, for studying human biofluid and tissue samples (Figure 1).

Figure 1
figure 1

Simultaneous MS analyses for understanding complex systems. Simultaneous study of the genome, transcriptome, proteome, glycome, lipidome and metabolome by MS provides a systems approach to understanding different conditions and disease states through analysis of variations in DNA, RNA, peptides/proteins, lipids and metabolites, respectively, in an organ, tissue, blood or other sample, or organism. MS is one of the only analytical tools that can perform measurements at each -omic level, and thus can provide a better understanding of molecular mechanisms and how they affect each other. PTM, post-translational modification.

Over the past decade, studies of protein biomarkers have allowed advances in early, non-invasive diagnosis of significant diseases, such as the identification of C-reactive protein and troponin I as biomarkers for myocardial infarction, and prostate specific antigen for prostate cancer [1214]. Despite these successes, efforts to identify biomarkers have not been nearly as successful as originally anticipated since proteomic analyses of blood and other biological fluids have proven to be immensely challenging because of the enormous complexity of the samples, the vast dynamic range of protein concentrations of potential interest (for example, greater than ten orders of magnitude in blood plasma), and the fact that analytes of clinical interest are often present at the low end of this concentration range [1517]. To further exacerbate these challenges, verification and population-scale validation of biomarkers requires the analysis of hundreds or even thousands of high-quality clinical samples. The collection and storage of these samples must be done carefully and monitored using standardized protocols to reduce variations due to endogenous enzyme activities or sample contamination. These studies also require multiple control groups and diagnostic subcategories of patients that are ideally gathered longitudinally over the course of disease progression. The analysis of many patient samples is required to characterize normal human genetic heterogeneity and disease heterogeneity [18, 19]. High throughput measurements are therefore essential to achieve biostatistical significance (Figure 2).

Figure 2
figure 2

Biodiversity in population proteomic studies. Population proteomics allows the analysis of protein biodiversity within a population. Because it is known that individual variation, such as the presence of point mutations and varying protein abundances, will be present in all human studies (as depicted by the different chromatograms), it has become essential to develop high throughput, sensitive analytical applications to enable measurements necessary for personalized medicine.

While current MS-based proteomic measurements are capable of providing great depth of coverage through the use of extensive fractionation and analysis, this generally precludes the throughput required and the levels of sensitivity and specificity necessary for the rapid identification of clinically useful biomarkers. However, recent technological advances in automated parallel sample processing methods [20], multidimensional separations prior to MS [21, 22], instrumentation components and approaches [2327], and high-performance informatics tools [2830] have facilitated measurements with both increased sensitivity and higher throughput for translational applications. In this review, we discuss the current state of MS-based proteomics with regard to its advantages and current limitations, and we highlight translational applications that are being enabled by these recent technological advances.

Advances in MS-based translational proteomics

The primary translational application of MS-based proteomics is biomarker development. However, as already mentioned, its success has so far been quite modest and has been mainly limited to preclinical studies. Biomarker development is a multi-stage process that consists of discovery, verification, validation and commercialization [15]. For MS, the measurements fall into two categories, where the first utilizes a discovery approach to identify potential protein biomarkers and the second involves verification to further assess and initially validate these biomarkers using a larger population. Performing high-quality measurements and rigorous statistical analyses are essential in both steps as valuable patient samples are used. Currently, both MS-based proteomic discovery and verification approaches use bottom-up methods (Figure 3) in which proteins are digested into smaller peptides before analysis [31]. However, the two approaches aim to obtain different types of information.

Figure 3
figure 3

Bottom-up MS approach. The most common MS-based proteomics approach is bottom-up analysis. In the bottom-up approach the proteins are first extracted from biofluids, cells or tissue. Enzymatic digestion of the proteins is then performed to fragment them into their corresponding peptide subunits, and the peptides are separated using LC and detected with MS. LC, liquid chromatography; MS, mass spectrometry; m/z, mass-to-charge ratio.

Discovery approaches

In the discovery phase, broad quantitative MS measurements often aim to identify peptides and proteins that differ significantly in abundances between patient and control groups. The main advantage of this approach is its largely unbiased ability to characterize a whole proteome or enriched sub-proteome in a single measurement, so that the protein alterations corresponding to a pathological or biochemical condition at a given time can be investigated. However, performing discovery-based proteomic analysis has proven to be quite difficult using plasma and serum samples. In plasma, proteins have concentrations ranging from approximately 3 × 1010 pg/ml for albumin to the low pg/ml range for some cytokines and proteins, such as those potentially secreted or leaking into blood, for example from tumors (Figure 4a). Because of this huge dynamic range and the fact that the proteome in human biofluid samples is mainly represented by only a few high abundance proteins - the 22 most abundant proteins represent approximately 99% of the total protein mass (Figure 4b) - analyzing all plasma proteins simultaneously is enormously challenging [11, 32], even after depletion of the most abundant proteins, as this exceeds the dynamic range of mass spectrometers that are typically used for discovery efforts (often approximately 1 × 103 to 1 × 104 for a single spectrum). To provide an extended dynamic range for increased protein coverage it is necessary to couple front-end separations such as liquid chromatography (LC), multi-stage immunoaffinity depletion [3335], fractionation [36], or a combination of all three with MS analyses. While advanced LC separations have already provided improvement in the depth of coverage for proteins detected in MS studies [37], a major problem is their concomitant reduction in throughput, as bottom-up LC-MS analyses typically require in the order of 1 h. The detection of more proteins (for example, thousands) from plasma is possible with extensive off-line fractionation prior to on-line LC-MS analyses [38], but days or weeks of LC-MS measurements are then necessary for analysis of the multiple fractions. While this approach is highly attractive for the detection and discovery of potential biomarkers, the inherently low throughput largely precludes population studies to enable investigation of human and disease heterogeneity, and also limits the possibility of performing personal profiling. Thus, technological advances that greatly decrease LC separation times or eliminate them entirely while still maintaining a high depth of coverage are crucial for future clinical applications.

Figure 4
figure 4

Protein dynamic range and percentage in blood plasma. (a) The normal range of protein abundances in plasma is illustrated for a subset of 34 proteins representing the most to least abundant. The figure was assembled using data from Anderson and Anderson [11]. Because the dynamic range of protein concentrations covers over ten orders of magnitude, with the proteins of interest present at the lower concentrations, analyzing plasma samples has proven to be very difficult. (b) The approximate percentages of each protein in plasma are further illustrated using pie charts for the most abundant 22 proteins representing approximately 99% of the plasma protein mass. The top 10 proteins that make up approximately 90% of all plasma proteins are shown on the left. The remaining 10% is further divided on the right with the least abundant remaining 1% group representing thousands of proteins, which are of most interest for biomarker studies. IgA, immunoglobulin A; IgG, immunoglobulin G; IgM, immunoglobulin M.

To attain further information and identify unknown peptides with high accuracy in bottom-up MS studies, tandem MS (MS/MS) measurements, involving multiple steps of MS analysis and peptide fragmentation, are essential. Currently, many immunoassays used in translational studies measure analytes indirectly by detecting them through their interaction with other molecules, such as antibodies. MS provides an advantageous alternative to immunoassays as it involves direct measurements and allows the acquisition of exact peptide sequence information through high mass accuracy MS/MS measurements, thereby allowing unknown peptides to be identified with a great degree of confidence. The simultaneous collection of MS and MS/MS measurements involves the acquisition of a preliminary mass spectrum of intact peptides, followed by dissociation or fragmentation of a peptide(s) of interest, and acquisition of the fragmentation mass spectrum. This process is repeated for the duration of the entire LC separation, resulting in thousands of MS and MS/MS spectra. To identify the peptides with MS/MS, genomic data are frequently used to generate theoretical sequences for bioinformatics tools such as Mascot [39], Sequest [40] and X! Tandem [41]. By evaluating all of the matched MS/MS spectra, the false discovery rate of the peptide identifications can be estimated [4244], and improved informatics tools are increasingly allowing identifications from spectra that were previously unattributed due to unexpected sequences or modification states.

Another important step in bottom-up measurements is quantification of observed peptides to determine if any significant changes are occurring between samples. Quantitative measurements of peptide abundance can be performed with or without stable isotope labeling (SIL) of peptides (or proteins) using peptide ion peak intensities or spectral counting (that is, 'label-free' quantification) [45]. Several in vitro and in vivo labeling techniques, such as stable isotope labeling of amino acids by cell culture (SILAC) [46, 47], isobaric tags for relative and absolute quantification (iTRAQ) [48, 49] and 18O-labeling [17] have been developed for MS-based quantification, and have been shown to provide lower standard deviations for peptide ion peak intensity measurements compared with the label-free methods [50]. When combined with off-line fractionation, these SIL methods provide broad coverage for comprehensive proteome characterization. However, label-free measurements using normalization of LC-MS analyses can also be quite effective and can avoid complications introduced by labeling approaches [51].

At present, data-dependent MS/MS analysis of selected peptides relies on an initial MS scan, and although it is widely used in proteomic discovery studies, it has inherent limitations that are associated with MS/MS undersampling in complex samples. To overcome these limitations and improve quantification, the accurate mass and time (AMT) tag strategy was developed for use on either labeled or label-free samples [52]. In a typical AMT tag study, a database is created and populated with peptide masses and LC elution times from many LC-MS/MS measurements using representative samples from experimental and control groups. High throughput LC-MS analyses are then performed for a large number of biological replicates and the acquired datasets are compared with the database to identify the peptides that are actually present. This approach allows the comparison of large numbers of peptide species that may not be identified in normal data-dependent MS/MS studies for reasons that include poor peptide fragmentation, co-elution of highly abundant species and/or informatics limitations - presumably similar factors that leave a significant number of detected species in LC-MS/MS analyses unidentified. Other approaches, such as data-independent MS/MS strategies [2527] (discussed later), have also been developed recently to significantly enhance unbiased discovery studies.

While these new approaches promise improved discovery of biomarkers, analysis of plasma samples still remains challenging for MS-based approaches. Discovery efforts increasingly use proximal fluids or tissues that are expected to be rich in biomarker candidates and present less of a challenge in terms of the dynamic range of proteins [15]. Various methods, such as optimal cutting temperature compound-embedded tissues, and formalin-fixation and paraffin-embedded tissues (with or without laser capture microdissection), have been developed for preparing clinical tissue samples for proteomic studies [53]. The results from these advanced preparation methods have been promising [54, 55], and serve as a prelude to targeted discovery or verification efforts for measurements of candidate biomarker proteins at presumably much lower levels in blood samples.

Verification approaches

The verification phase typically uses a much larger number of samples, and focuses on a limited set of candidate peptides or proteins identified in the discovery approach. This approach can provide highly sensitive quantification of protein abundances and aims to identify a set of biomarker candidates with greater confidence. There have been significant developments in MS-based methods for the verification approach, providing much greater sensitivity, specificity and throughput, and more accurate quantification than broad discovery-based measurements.

Targeted quantitative MS-based measurements typically employ selected reaction monitoring (SRM), using triple quadrupole mass spectrometers. In SRM measurements, the triple quadrupole MS allows rapid detection of a series of targeted peptide ions and their corresponding fragments (that is, transitions) with multiplexing and 'scheduling' capabilities (to perform pre-defined analyses during specific LC elution times) along with SIL internal peptide standards [56, 57] to provide highly accurate quantification for up to hundreds of peptides during a single LC separation. The two-stage mass filtering in SRM (that is, for both peptide ions and their corresponding fragments) provides great sensitivity and specificity for detection of the targeted peptides. This capability often leads to observed limits of detection and limits of quantification (LOQ) of about 10 to 100 ng/ml in plasma - several orders of magnitude lower than presently feasible with discovery-based platforms. Moreover, recent advances such as the use of protein depletion, limited fractionation, and targeted peptide enrichment methodologies, such as peptide isolation with stable isotope standard capture with anti-peptide antibodies (SISCAPA) [58], extend practical LOQ values to low ng/ml (or even low pg/ml) levels in blood samples [33, 59]. The implementation of other instrumental modifications such as multi-inlet capillaries and dual-stage ion funnels has led to further enhanced sensitivity [60]. While selection of the correct proteotypic or targeted peptides with good digestion and ionization efficiency requires some effort, this has increasingly been addressed using public repositories, including SRMAtlas [61, 62], PeptideAtlas [63] and the Global Proteome Machine [64]. Recent computational developments have also allowed the creation of programs that effectively predict proteotypic peptides given a protein amino acid sequence [65, 56], allowing the list of targeted peptides to be derived without the need to rely on discovery-based proteomics data. Moreover, MS targeted measurements have also proved reproducible in assays across many different proteomics laboratories [66]. These various features have made SRM the current method of choice for ultra-sensitive MS-based biomarker verification (or pre-clinical validation).

Addressing the challenges of translational proteomics

Despite significant advances in MS-based targeted analyses, several performance metrics, including measurement throughput and detection sensitivity, still require compromises to biomarker discovery and verification approaches for the translational application of MS-based proteomic analyses. In particular, these deficiencies result in low sampling numbers and measurement quality that prevents detection of proteins present at low concentrations. To achieve further progress in translational proteomics, technological developments in MS such as faster separations, more effective ion sources, higher instrumental resolution/mass accuracy, detectors with greater dynamic range, and advanced data acquisition approaches are expected to increasingly allow broad non-targeted measurements that retain the benefits of targeted approaches.

Data-independent MS/MS acquisition has shown promise for improving the consistency of peptide identifications, as well as for increasing protein sequence coverage in complex samples and creating broad untargeted measurements that are more similar to possible targeted measurements [2527]. Data-independent acquisition is a strategy that systematically queries sample sets for the presence and quantity of essentially any protein of interest, using the information available in fragment ion spectral libraries to mine complete fragment ion maps. One way of performing data-independent acquisitions is by using sequential window acquisition of all theoretical fragment-ion spectra (SWATH™) MS, in which repeated cycling of a 25 Da precursor isolation window is used in a single analysis to obtain time-resolved fragment ion spectra for all analytes detectable within a user defined mass-to-charge ratio (m/z) precursor range. Initial results have been very promising, with queried peptides quantified with a consistency and accuracy apparently approaching that for SRM [25].

Another approach to exploit data-independent acquisitions involves using an additional separation technique prior to fragmentation to increase measurement sensitivity and the ability to associate simultaneously fragmented precursor ions with their corresponding fragment ions. Fast gas-phase ion-mobility spectrometry (IMS), taking place in a timescale of tens of milliseconds, offers an attractive ion separation approach for data-independent acquisitions. IMS was introduced in the 1970s [67] and utilizes the fact that ions subject to an electric field in a buffer gas quickly reach a steady velocity dependent on the ion shape: compact species drift faster than those with extended structures [68, 69]. IMS can be easily coupled to quadrupole time-of-flight MS, allowing placement of IMS between the LC and MS stages. The resulting IMS-MS instrument produces high-resolution spectra containing both the m/z and IMS drift time information concurrently. To perform data-independent acquisitions, all precursor ions are fragmented after the IMS separation but prior to MS detection to completely eliminate MS/MS undersampling. Because fragmentation occurs after the IMS separation, all fragment ions have the same drift time as the precursors [7072], allowing simplification of spectral deconvolution, which adds a great benefit to this technique.

The increased sensitivity and reduced spectral congestion in the IMS separation also has another advantage of reducing or completely avoiding the LC time in complex samples [73]. When IMS is coupled with MS, ions are separated prior to detection, reducing detector suppression while supplying an additional dimension for peptide identification. Practical use of IMS-MS was initially impeded by low sensitivity due to significant ion losses at the IMS terminus and during transfer to MS. However, this problem was solved by re-focusing ions with an ion funnel at the IMS-MS interface [74], essentially preventing ion losses during the operation. The introduction of ion funnels in 1997 [75] provided a huge improvement in sensitivity of MS instruments as it allowed ions to be focused through the small interface orifices necessary for ultralow MS vacuum pressures (1 × 10-7 to 1 × 10-8 Torr), as shown in Figure 5. The ion funnel is most often implemented in the source region of mass spectrometers to greatly increase the sensitivity of measurements, and has gained importance with its recent inclusion in commercially developed instruments. While these developments are just a first step in the convergence of discovery and verification platforms, further progress will be facilitated by emerging approaches for faster and higher resolution separations, improved MS resolution and extended detector dynamic range.

Figure 5
figure 5

Increased MS sensitivity with ion funnel focusing. Technology developments such as the ion funnel greatly increase MS instrument sensitivity by re-focusing all ions through narrow interfaces necessary to maintain the high vacuum required for MS measurements. Two ion funnels are depicted here. The first funnel on the left is a conventional ion funnel that focuses the continuous beam from two capillaries through a narrow inlet. The second funnel (right) is a trapping ion funnel used to trap and pulse ions for ion mobility spectrometry (IMS) experiments. ESI, electrospray ionization.

Clinical implications

The potential to use MS-based proteomics in clinical settings is largely judged by their ability to make robust, sensitive, quantitative, specific and high-throughput measurements for highly complex biospecimens. Clinical questions and the corresponding requirements for biospecimen detection determine the ability of MS to find and routinely measure high-quality biomarkers that have sufficient sensitivity and specificity to be clinically useful in screening large populations - for example, for diagnostic tests or early disease detection. For instance, SRM has already been widely used for measuring metabolites for newborn screening in clinical laboratories [76]. However, the detection of low abundance proteins in complex samples requires measurements of high sensitivity and large dynamic range, aspects of MS performance that are currently achievable and presently being greatly improved due to developments in MS-based instruments and new separation technologies, as mentioned earlier. Narrowing large lists of differentially abundant proteins into defined patterns of biologically important variations, to reveal a much smaller set of candidate proteins that can be detected with the high sensitivity and specificity that is needed for clinical utility, requires verification studies typically involving many hundreds of samples at a minimum. Currently, the low throughput of conventional MS platforms severely inhibits biomarker verification. However, recent improvements in MS-based proteomic approaches, ranging from sample processing to data acquisition as outlined earlier, are resulting in rapid and highly sensitive MS analyses that are now providing a viable future for MS-based measurements in clinical laboratories.

When comparing MS-based proteomics with immunoassays, which are the current gold standard in clinical detection of protein biomarkers, MS-based proteomics offer a significantly shorter lead-time and cost for assay development, high capability for multiplexed analysis, and the ability to be highly configurable or flexible for measuring different clinical analytes. With these advances and unique features, MS-based translational proteomics have the potential to become powerful tools for decision making in the clinic, alongside other approaches such as physical examination, in vivo imaging, histology, biochemical assays and assessment of demographic risk factors. Their potential applications for discovering and measuring protein biomarkers could include routine screening, staging of disease progression, prediction of the course of disease, assessment of disease outcome, monitoring disease recurrence, and personalized assessment of drug response and toxicity, to name a few.

Outlook and perspectives

The future of MS-based translational proteomics can be categorized by what is currently practical, and what is being enabled by recent technological developments. In the short term, proteomic measurements using targeted approaches are effective for high sensitivity and high throughput analysis of a limited set of biomarker candidates, whereas unbiased broad measurements are effective for the detection of a much larger universe of biomarker candidates but with less sensitivity [27]. Improvements to the sensitivity of broad measurements and the scope of targeted measurements are ultimately driving a convergence of these platforms and are expected to increase the ability to gain a predictive understanding of molecular processes in complex biological systems [77]. While MS-based proteomics offers valuable information for understanding complex biological systems, systems-level quantitative analyses using a combination of broad proteomic, metabolomic, lipidomic and glycomic MS analyses (termed pan-omics) will be increasingly important (Figure 1). These approaches largely benefit from the same MS-based platform developments that are allowing advances in translational proteomics. The importance of each -omic measurement technology has already become evident - for example, through the success of targeted measurements [76, 78] - and their combination into transformative pan-omics measurement capabilities would likely be crucial for understanding the complexity of biological systems. Thus, broad pan-omic discovery methods, if sufficiently sensitive and effective, would be expected to provide a much more informative clinical toolset of biomarkers for accurate prediction of disease onset, and for disease monitoring and prognosis.