Keywords

8.1 Body Fluids

Body fluids can be classified as systemic or proximal. Systemic fluids are fluids that represent the overall physiological state of an organism, since they are circulating the whole or most parts of the body. Proximal fluids provide a more specific picture of a tissue, because they are the products of a healthy or diseased organ (Paulo et al. 2011). Body fluids like urine encompass both categories. The direct advantage of analysis of a proximal fluid is the increased chance of detecting specific biomarkers and downstream products of a disease, while systemic fluids can provide information about the general health state of an organism. Although blood is a systemic fluid with tightly regulated homeostatic mechanisms, it can be a highly valuable sample in cases where proximal fluids cannot be obtained or early diagnosis of a pathological state is the goal (Capelo-Martínez 2019).

8.1.1 Blood

Blood is arguably the most readily accessible biofluid and contains proteins that are shed, leaked or secreted from a multitude of tissues (Ahn and Simpson 2007; Zhao et al. 2018). Plasma is the portion of the blood where the coagulation cascade and clotting mechanisms have been inhibited with anticoagulants. Serum is the blood fraction that is collected after clotting, thereby being free of clotting proteins as well as cells and platelets trapped in the fibrin network (Capelo-Martínez 2019). The proteome composition of plasma and serum has been found to be substantially different (Sapan and Lundblad 2006).

Plasma and serum represent a rich protein source, with numerous of their constituents routinely monitored and quantified in clinical practice. Protein concentration ranges between 60 and 80 mg/ml in specimens from healthy individuals, whereby albumins and globulins account for a large portion of the total protein content. Overall, there is a huge difference in the amounts of individual protein species, which are present in concentrations spanning a range of over ten orders of magnitude (Geyer et al. 2017). About 90% of the total protein amount is represented by the 12–14 most abundant proteins, although it is believed that serum contains up to 10,000 proteins (Kessel et al. 2018). This poses an immense challenge with respect to detection and quantification of lowly abundant proteins in serum that often have high potential to serve as the most indicative biomarkers. Most body fluids exhibit the same problem, and enrichment strategies that partially ameliorate this issue are addressed below. As expected, MS-based proteomics has been extensively applied to investigate plasma proteome composition. The Human Plasma Proteome Project (HPPP) illustrates an effort to document the full protein content of human plasma (Schwenk et al. 2017). Serum proteomics have also been used to examine changes in protein levels in a range of diseases, such as nonalcoholic fatty liver (Younossi et al. 2005) and liver fibrosis (Gressner et al. 2009), rheumatoid arthritis (Park et al. 2015) and cancer (Peng et al. 2018). Several studies and reviews on MS-based proteomics biomarker discovery are available in the literature (Geyer et al. 2016, 2017; Huang et al. 2017), as well as protocols for sample preparation and guidelines for specific analysis workflows (Greco et al. 2017; Lan et al. 2018; Moulder et al. 2018). Peptidomics studies have also been performed for biomarker discovery in serum under normal (Arapidi et al. 2018) and disease conditions (Fan et al. 2012; Yang et al. 2012, 2018; Widlak et al. 2016; Shraibman et al. 2019).

8.1.2 Urine

Urine is another major biofluid that has been thoroughly investigated by MS-based proteomics and can be collected in large volumes in a non-invasive manner (Csősz et al. 2017). Other advantages include minimum cost, the possibility of easy continuous sampling in time-resolved studies and a relatively lower proteome complexity than serum. Urine is a result of blood filtration and contains proteins and peptides that have not been reabsorbed by the organism in this filtration and clearance process (Krochmal et al. 2018). Despite showing a lower protein concentration than serum, it is a body fluid of great interest, especially in the field of biomarker discovery, as it collects components from blood, kidneys and bladder and it has the potential to indicate the natural or pathological conditions of multiple organs (Capelo-Martínez 2019). Urine protein concentration is in the range of 20–100 μg/ml, consisting of mostly low molecular weight proteins (Edwards 2008). As in serum/plasma, albumin is the prevalent protein present in urine. Albumin is followed in abundance by microglobulins and uromodulin, and protein concentration again spans several orders of magnitude (Capelo-Martínez 2019). Urine exhibits high thermodynamic stability, presumably due to its incubation in the bladder, lowering the endogenous proteolytic activity and its low molecular weight protein content (Rai et al. 2005; Tammen et al. 2005). As examples, the urinary proteome or peptidome has been investigated for prostate cancer and rheumatoid arthritis biomarkers (Kang et al. 2014; Jedinak et al. 2018), acute coronary syndromes and kidney diseases (Htun et al. 2017; Sirolli et al. 2019).

8.1.3 Cerebrospinal Fluid

Cerebrospinal fluid (CSF) is an additional body fluid that has attracted attention in proteomics studies. However, invasive methods are required to obtain CSF and the medical condition of the patient has to be taken into consideration before sampling, CSF can be used to detect changes in protein expression in neurological and other disorders and diseases. CSF is a biofluid present in the ventricular system of the central nervous system as well as the surroundings of the spinal cord and brain. CSF is excreted by cells of the choroid plexus and ependymal cells, mediating molecule exchange with blood plasma and brain tissue (Maurer 2008). CSF is fully renewed 3–4 times per day and its composition depends on the sampling location. CSF protein concentration is comparatively low at 0.3–0.7 mg/ml, albeit still suffering from the issue of a few major proteins overshadowing the rest, a characteristic of most biofluids (Davidsson et al. 2002; Hu et al. 2006). CSF proteomics has been employed for the study of spinal cord injury (Streijger et al. 2017), depressive disorder (Al Shweiki et al. 2017), meningitis (Gómez-Baena et al. 2017) and multiple sclerosis (Kroksveen et al. 2015), other than the characterization of the physiological protein composition (Zhang et al. 2005; Barkovits et al. 2018).

8.1.4 Synovial Fluid

Another biofluid of interest is synovial fluid (SF) whose analysis in disease conditions can illuminate pathological states of the joint structure (Driban et al. 2010; Sohn et al. 2012; Kiapour et al. 2019). SF is encapsulated by the joint cavity of the synovial joint. The inner layer of the joint capsule which encircles the joint accommodates a membrane of fibroblast-like cells that produce the viscous SF. SF mediates cell interactions, facilitates molecule transport and lubricates the articular cartilage (Mahendran et al. 2017). Since the synovial membrane has semi-permeable properties, passive diffusion of plasma proteins through the membrane is a natural phenomenon. Excreted proteins from cells in the joint apparatus are also present, making SF a dynamic fluid influenced by a plethora of different factors. Normal human SF protein concentration is in the range of 19–28 mg/ml and increases in pathological states. Also in this body fluid, albumins (approximately 12 mg/ml) and globulins (at 1–3 mg/ml) account for a large portion of the protein content (Hui et al. 2012). Protein size plays an important factor in relative abundances in SF, as it is directly associated with the protein’s ability to penetrate the permeable membrane. Hence, large plasma proteins are found at low concentrations in normal SF. On the contrary, pathological SF resembles serum with respect to protein abundance. Synovial fluid proteomic studies are mostly associated with pathologies related to arthritis (Bhattacharjee et al. 2016; Mahendran et al. 2017; Vicenti et al. 2018).

8.1.5 Salivary and Tear Fluid

Salivary fluid and its proteome content has also been investigated, since it as well presents the advantages of non-invasiveness and availability for biomarker discovery (Aqrawi et al. 2017). Saliva is a fluid secreted from the salivary glands and the gingival crevice (Humphrey and Williamson 2001). It contains thousands of proteins (Schulz et al. 2013), of which α-amylase, mucin, cystatins, proline-rich proteins and globulins are most prominent (Hu et al. 2005; Guo et al. 2006; Denny et al. 2008). Saliva is mostly made of water (99%), electrolytes, urea and proteins. Saliva protein concentration is highly variable, but it naturally ranges from 0.7 to 2.4 mg/ml (Lin and Chang 1989; Shaila et al. 2013). Proteomics studies have identified more than a thousand proteins in salivary fluids (https://salivaryproteome.nidcr.nih.gov/). Salivary proteomics or peptidomics have been employed to study differential protein abundance and potential biomarkers in jaw osteonecrosis (Thumbigere-Math et al. 2015), oral lesions (de Jong et al. 2010), periodontitis (Trindade et al. 2015), oral cancer (Gallo et al. 2016; Stuani et al. 2017), as well as systemic conditions such as autoimmune diseases (Ohyama et al. 2015) and diabetes mellitus (Caseiro et al. 2013). The protein content of extracellular vesicles in human saliva in relation to lung cancer has also been investigated (Sun et al. 2018).

Tear is another highly examined body fluid, which can be collected non-invasively. It consists of proteins, lipids and other molecules produced in the lacrimal glands, whereby its normal protein concentration is 5–7 mg/ml (Fullard and Snyder 1990). Tear fluid contains more than 1500 proteins, of which the most prominent ones have been associated with pathogen defense (Zhou et al. 2012). Changes in protein expression levels can reflect inflammatory states in eye-associated or systemic diseases (Li et al. 2008; Le Guezennec et al. 2015). Tear proteomics has also been utilized for the assessment of several ocular-related conditions (Tomosugi et al. 2005; Zhou et al. 2009a,b; Aluru et al. 2012; Leonardi et al. 2014).

8.1.6 Seminal Fluid and Sweat

Semen is a biofluid material, which can be exploited not only in basic research, but also in forensic studies (Merkley et al. 2019). The acellular fraction of semen, or the seminal fluid, accounts for 95% of semen volume (Jodar et al. 2017). More than 6000 proteins have been identified with high confidence in semen proteomic studies investigating the spermatozoal proteome (Amaral et al. 2013). In this, proteins partaking in DNA packaging, RNA metabolism and transport as well as other metabolic processes for energy generation were overrepresented (Amaral et al. 2013; Oliva et al. 2015). In seminal fluid, a little more than 2000 non-redundant proteins have been identified (Gilany et al. 2015), with semenogelins accounting for 80% of the total protein content (Drabovich et al. 2014). About 10% of seminal fluid proteins are encapsulated in extracellular vesicles present in seminal fluid, namely the epididymosomes and the prostasomes whose prominent components are enzymes with GTPase activity and proteins related to phospholipid binding (Jodar et al. 2017). A recent study analyzed the exosomal proteome content in seminal fluid (Yang et al. 2017), complementing results that have been summarized in several review articles on semen proteomics (Gilany et al. 2015; Jodar et al. 2017; Druart and de Graaf 2018).

Sweat, as an excreted fluid, is also important in indicating the physiological state of an organism, and it has been studied with proteomic methods. Similar to salivary fluid, sweat is highly diluted, and the proteins in sweat contribute to defensive mechanisms against pathogens and tissue regeneration following injury (Schittek et al. 2001). Identified proteins are in the range of high hundreds, whereby most abundant proteins include dermicin, clusterin and albumin (Yu et al. 2017). Sweat proteomics studies have explored its role in defense and skin immunity (Csősz et al. 2015; Wu and Liu 2018) as well as sweat protein composition in general (Yu et al. 2017; Na et al. 2019).

8.1.7 Wound Exudate

An additional body fluid that has gained considerable momentum for biomarker discovery and proteomic analysis is wound exudate (Mannello et al. 2014). Wound fluid, which can be classified as proximal fluid, enables investigating the state of the local wound tissue and can serve as an indicator of the wound healing trajectory (Kalkhof et al. 2014; Lindley et al. 2016). Proteins present in wound fluid play an important role in modulating responses to injury and regulating the wound microenvironment (Cavassan et al. 2019). The underlying biological mechanisms of chronic inflammation and non-progressive wounds are still poorly understood. Therefore, wound exudates present an excellent biological matrix for biomarker discovery in chronic, non-healing wounds. Protein concentration depends on sampling method and wound type. Abundant proteins in wound fluids largely overlap with highly abundant serum proteins, of which albumin is the main component. As a result of matrix remodeling and tissue repair and regeneration, extracellular matrix proteins are also overrepresented in these specimens. Fluid from wounds contains inflammatory mediators such as chemokines and growth factors, which are required to orchestrate the distinct phases of the wound healing process. Wound exudate proteins have been found to span at least six orders of magnitude in concentration (Sabino et al. 2015).

8.1.8 Other Body Fluids and Exosomes

Several other types of body fluids such as amniotic fluid and gingival crevicular fluid are available and have been investigated in proteomic studies (Cho et al. 2007; Khurshid et al. 2017; Zhao et al. 2018). Special attention has also to be given to extracellular vesicles (EVs) that are present in basically all body fluids. EVs are a largely unexplored pool of protein transport media that could possess significant relevance for biological questions. Most of the discussed body fluids contain EVs secreted from adjacent or distant cells, and their proteomic content and its changes over time or in different conditions may be of use in biomarker discovery (De Toro et al. 2015; Sódar et al. 2017). Membrane proteins in EVs are involved in cell interactions, adhesion, signaling and ion transport as well as in immune responses (Gutiérrez-Vázquez et al. 2013; Mulcahy et al. 2014; Turturici et al. 2014). Exosomes have been studied in several biofluids, including plasma (Caby et al. 2005), urine (Nilsson et al. 2009), saliva (Ogawa et al. 2011; Sun et al. 2018) and semen (Utleg et al. 2003; Thimon et al. 2008). The diversity of the proteomes discovered in EVs is partly associated with differences in isolation methods (Yáñez-Mó et al. 2015). Almost 35,000 proteoforms have been annotated in EVs and numerous potential markers for different conditions have been identified and validated (Csősz et al. 2017).

8.2 Mass Spectrometry-Based Proteomics

Mass spectrometry (MS)-based technologies have rapidly evolved over the past two decades. For instance, the invention and commercialization of the Orbitrap mass analyzer in MS instruments in 2005 has significantly advanced the field (Eliuk and Makarov 2015). Along with the gradual, steady improvement of MS-Time-of-Flight (ToF) instrumentation, the two technologies have become the gold standard in MS-based experimentation. Further developments in software and data analysis tools as wells as enhancements in instrument speed and sensitivity have made MS-based proteomics the preferred method for interrogation of complex biological samples including body fluids.

Numerous MS-based proteomics workflows have been established, but the method of choice mostly depends on the specific biological question or diagnostic need. In most proteomic analyses of body fluids, a bottom-up methodology is applied, consisting of the following steps: sample preparation, digestion with an endoproteinase for generation of peptides, inline-liquid chromatography (LC) separation and tandem MS analysis. Additional steps are frequently added to this generic workflow, such as a second offline-LC step, desalting, ultrafiltration and protein precipitation. Major differences in workflows depend on, whether a discovery approach will be taken, or if specific analytes are to be monitored and quantified (Schubert et al. 2017).

8.2.1 Discovery Proteomics

Discovery experiments aim at fully deconvoluting the proteomic content of a sample by identifying as many proteins as possible in a wide range of concentrations. These approaches do not require any prior knowledge of the sample composition and can be employed for protein identification and analysis of post-translational modifications. Discovery proteomics workflows can be further subdivided into data-dependent acquisition (DDA) and data-independent acquisition (DIA) methods. In DDA mode, peptides are analyzed at the peptide precursor level (MS1) in predefined packets depending on the instrument cycle time. The most abundant precursor peptides of each packet are selected and fragmented into fragment ions. At the second level of tandem MS, the fragment ions are measured and matched to their precursors. The measurements are then compared to theoretical fragmentation patterns and mass to charge ratios from sequence databases and mapped to the proteins of origin with help of software tools (Meyer 2019). This methodology enables accurate and sensitive identification of thousands of proteins from a single sample in a single experiment. DIA methods differ from DDA by fragmentation of all precursors assigned to distinct windows of mass to charge ratios, which results in a much more complicated data scheme and makes the identification of peptides from mass spectra computationally more challenging. However, a clear advantage of DIA methods is the unbiased, non-stochastic and universal data acquisition, providing a close to complete overview of the protein constituents in any biological sample (Barkovits et al. 2018). The most popular DIA method, sequential windowed acquisition of all theoretical fragment ion mass spectra (SWATH), has been extensively used in the field of body fluid proteomics and biomarker discovery (Vaswani et al. 2015; Krisp and Molloy 2017; Anjo et al. 2017; Lewandowska et al. 2017; Liao et al. 2017; Miyauchi et al. 2018; Ludwig et al. 2018).

Both DDA and DIA can be applied to gain qualitative or quantitative information (Schubert et al. 2017). Qualitative analyses aim at identifying as many proteins as possible and thereby obtaining the most comprehensive overview of the sample proteome. Typically, these discovery proteomic techniques include supplementary enrichment and/or fractionation steps in an attempt to increase the proteome coverage of the identification procedure. Quantitative proteomics adds an additional dimension by attempting to determine abundances of as many proteins as possible with maximum accuracy. Quantification can be relative by comparing different conditions and/or to control samples or absolute in terms of protein concentration in a sample. Multiple workflows for relative quantification of proteins have been integrated into the general bottom-up proteomics approach and are regularly applied in the analysis of body fluids.

Label-free quantification does not require any additional sample modification and can be performed at the data analysis step. The simplest approach is based on the assumption that the abundance of a protein is related to the number of spectra generated from its peptides and thus determines relative protein quantities by spectral counting. However, this method suffers from the stochastic aspect of mass spectrometric detection in DDA experiments, assay variation and difficulty of quantification at the protein level (Arike and Peil 2014). As a consequence, spectral counting has been shown to be accurate in detecting large changes in protein levels but far less precise for detecting subtle and less significant differences (Liu et al. 2004). This can be overcome by comparing LC-MS peak areas of peptide precursors in MS1 ion current-based quantitative proteomics, a powerful approach for analyzing samples from large cohorts in flexible experimental designs (Wang et al. 2019). A widely applied implementation of this approach is the MaxLFQ algorithm (Cox et al. 2014), which has also been extensively used e.g. in quantitative plasma proteome profiling (Geyer et al. 2016). DIA workflows inherently use label-free quantification but mostly at the MS2 rather than the MS1 level (Pappireddi et al. 2019) with advantages in minimizing the number of missing values and a high quantitation accuracy (Muntel et al. 2019). This has also been extensively exploited in assessing relative protein abundances in body fluid proteomes (Liu et al. 2013, 2015).

In contrast to label-free approaches, label-based relative quantitative proteomics workflows require modifications to the proteome by incorporation of stable isotopes. This can be achieved by metabolic incorporation or chemical labeling. Metabolic incorporation has been implemented with stable isotopic labeling with amino acids in cell culture (SILAC), where natural amino acids in the growth medium are replaced by amino acids with stable isotopes. Cells grown in this culture medium incorporate the labeled amino acids, allowing to distinguish their proteins from non-labeled control cultures in downstream MS analyses (Kani 2017; Hoedt et al. 2019). SILAC is a consistent, cost-effective and undemanding metabolic labeling method that has been used in a multitude of MS studies (Chen et al. 2015; Wang et al. 2018; Deng et al. 2019). Since SILAC would require incorporation of isotopically labeled amino acids at the organism level, its application is limited in body fluid proteomics. However, SILAC-encoded proteomes can be used as internal standards and have been applied for quantitative comparison of proteins in tissues and blood samples (Zhao et al. 2013; Dittmar and Selbach 2015).

As an alternative to amino acid labeling, chemical labeling utilizes isotopically-encoded chemical tags that are mostly attached to the highly reactive primary amines at lysine side chains and peptide N-termini (Chahrour et al. 2015). Prominent examples are the isotope coded affinity tag (ICAT), isobaric tag for relative and absolute quantitation (iTRAQ) and tandem mass tag (TMT) labels, which are frequently employed to quantitatively compare multiple samples in a single run (Liang et al. 2012; Ren et al. 2017; Moulder et al. 2018; Rao et al. 2019). Recent advancements in the TMT technology allow multiplexed analyses of up to 16 samples in the same MS experiment, significantly reducing measurement time and increasing sample throughput (Thompson et al. 2019). As a consequence, TMT-based quantitative proteomics has advantages in the analysis of samples from larger patient cohorts (Zecha et al. 2019).

8.2.2 Targeted Proteomics

With the rapid progress in identifying comprehensive proteomes by discovery proteomics, targeted approaches have received increasing attention to specifically monitor protein targets such as biomarker candidates. Targeted methods are by nature quantitative and are used when a set of proteins of interest has been defined and needs to be accurately and reproducibly quantified with high sensitivity in many samples. A typical scenario is the identification of biomarker proteins by discovery proteomic analysis of body fluids from a defined group of patients, which are then validated by targeted monitoring in a separate and larger cohort (Altelaar et al. 2013). In this type of MS-based proteomic analyses target proteins or peptides are the focus of the experiment (Sethi et al. 2015). Precursor ions are selectively monitored by the instrument, after specific properties of the precursors such as mass to charge ratios and LC retention times have been predetermined.

Multiple targeted methods exist, but they share the principle of monitoring a specific ion and its subsequent fragmentation products over time. Selected reaction monitoring (SRM) needs triple quadrupole mass spectrometers and is a method that monitors the precursor ion of interest in the first stage of tandem MS and a single specific ion product of a precursor fragmentation reaction in the second MS stage termed a transition. The chromatographic peak of the transition can be used for quantification through numerical integration. Relative quantification of peptides is performed by comparing chromatographic elution areas recorded for the same transition in samples obtained under different conditions. Multiple reaction monitoring (MRM) is a surrogate method, with the difference of monitoring multiple ion transitions, rendering the analysis and quantification more robust and reliable. Lastly, parallel reaction monitoring (PRM) follows all transitions of a precursor without selection and thus can be performed on the same instruments used for discovery proteomics, such as Orbitrap and ToF analyzers. Thereby, the speed, resolution and sensitivity of current mass spectrometers allow monitoring hundreds of transitions with high precision in a single run. This number can be further increased by spiking in isotopically-labeled internal standard peptides (IS-PRM) to significantly reduce the required measurement time per transition (Gallien et al. 2015). Internal standards also aid in relative quantification and if their concentration is exactly known as for absolute quantitation (AQUA) peptides, they can be used for absolute quantification of peptides and proteins of interest. Concomitantly, reference peptides significantly increase specificity and sensitivity of detection and therefore panels have been developed in particular for the analysis of body fluids (Rice et al. 2019). Targeted methods have been improved over the years, evolving to the method of choice for sensitive, high-precision and high-throughput detection and quantification of target peptides, rendering them especially suitable for biomarker studies (Castro-Gamero et al. 2014). As an example, DIA approaches follow the same principle as PRM but without precursor selection and have the potential to soon enable targeted analysis of complete proteomes.

8.3 Sample Preparation

8.3.1 Sample Handling

Many sample treatments have been developed for MS workflows to improve sample purity, number of protein identifications and quantification accuracy. The following sections give an overview of sample preparation protocols and discuss general considerations related to sample handling. Several articles are readily available and should be consulted for a more comprehensive insight into state-of-the-art sample treatments in body fluid proteomics (Paulo et al. 2011; Hulmes et al. 2004; Kuljanin et al. 2017; Geyer et al. 2019).

As a first measure, it is highly advised to inhibit innate enzymatic activity after sample collection and preprocessing (Hulmes et al. 2004; Rai et al. 2005). Particularly in body fluid samples, endogenous enzymes could still be active, and any activity would alter the proteome, losing its ability to reflect the state of a tissue or organism at time of sampling. Frequently used are protease inhibitors to impair non-physiological protease activities after sample extraction and anticoagulants to inhibit the coagulation cascade in plasma samples. Depending on the specific sample and the biological process in focus, this is extended to phosphatase inhibitors and additional substances interfering with protein-modifying enzymes. With respect to sample handling and preparation for MS analysis, technical procedures are mostly focused on sample purification and clean-up. For accurate and sensitive analysis of proteome content using MS, samples have to be free of contaminants and biological molecules other than proteins and peptides.

8.3.2 Dynamic Range Reduction

A particular challenge in body fluid proteomics is the enormous dynamic range in protein concentration, spanning up to 12 orders of magnitude, i.e. a ratio of about one copy of a low-abundance protein to one trillion copies of a highly abundant protein (Corthals et al. 2000). In serum, the top 22 most abundant proteins represent 99% of the total protein content, while the remaining 1% comprises thousands of different, lowly abundant proteins. As a result, peptides from the highly abundant proteins tend to overshadow and suppress detection of low-abundance peptides, substantially decreasing protein coverage in MS experiments. Even with the rapid advancements in the field, contemporary mass spectrometers do not match the concentration range of body fluids such as plasma or urine. Moreover, the complexity at the sample level is further increased, if lipids, salts and other metabolites are taken into account and results in a loss of information for lowly abundant proteins, which are frequently the subject of investigation. This dynamic range issue poses a major challenge when sensitivity and reproducibility is a requirement and is the major limiting factor in MS-based proteomics of body fluids. Biomarkers for diseases and pathological conditions are frequently in the lower protein concentration range, making accurate and consistent identification and quantification particularly difficult. To address this problem, MS workflows, especially those optimized for body fluid analysis, include various sample fractionation or enrichment techniques to reduce the dynamic range. This can be achieved either by depletion of high-abundance proteins or by dynamic range compression.

8.3.2.1 Depletion of High-Abundance Proteins

Depletion of high-abundance proteins and the resulting relative increase in low-abundance biomarker candidates is a primary focus in MS-based analysis of body fluids and multiple techniques have been developed (Pisanu et al. 2018). One of the most prevalent depletion methods is column-based immunodepletion, where highly abundant proteins such as albumin and immunoglobulins are captured, thereby lowering their concentration in the sample of interest. General depletion of albumin and IgG proteins can be achieved by dye methods (e.g. Cibacron) and protein A/G based columns, respectively. Monoclonal and polyclonal antibodies or peptide ligand-based columns can be used to remove specific proteins. For instance, there are numerous immunoaffinity commercial kits available for depletion of the most prominent proteins in plasma. Examples include ProteoPrep® (Sigma-Aldrich) for depletion of albumin and IgG proteins, Seppro® IgY14 (Sigma-Aldrich) for depletion of the 12 most abundant plasma proteins, ProteoPrep 20 (Sigma-Aldrich) for the depletion of the 20 most abundant plasma proteins, Aurum Affi-Gel Blue Mini Columns and Kit (Bio-Rad) for albumin depletion, Pierce™ Top (ThermoFisher Scientific) for depletion of the 12 most abundant plasma proteins, and MARC (Agilent), which are available in multiple configurations. The effectiveness of immunodepletion methods is undisputable. However, apparent disadvantages are the potential loss of low molecular weight proteins that are bound to highly abundant carrier proteins like albumin and the unspecific binding of lowly abundant proteins to the column’s affinity ligand.

8.3.2.2 Dynamic Range Compression

An alternative approach to reduce the dynamic range that has attracted attention for its sensitivity and effectiveness is dynamic range compression commercialized as ProteoMinerTM (Bio-Rad). ProteoMinerTM can be more accurately defined as an equalization rather than a depletion method. This technique uses a combinatorial hexapeptide-bead library bound to a chromatographic column that compresses the dynamic range of proteins in plasma samples. The library has a limited but equal binding capacity for each protein, which implies that highly abundant proteins will quickly reach saturation levels, while low-abundance proteins will fully bind to the beads. As the unbound fraction of the sample is washed away, this technique enables concurrent enrichment of the underrepresented, low-concentration proteins and depletion of the highly abundant protein components (Fonslow et al. 2011; Li et al. 2017; Moggridge et al. 2019). Since no proteins are completely depleted from the sample, ligands and other proteins bound to highly abundant carriers will still be represented in the eluted sample. Importantly, relative quantitation of low-abundance biomarker candidates is not impaired, because concentrations of these proteins are generally far below saturation level.

8.4 N-Terminal Enrichment and Degradomics

A growing body of research is devoted to proteolytic events, resulting from enzymatic activity of proteases in a biological system. Proteolysis is a very common post-translational modification, which plays a role in a myriad of biological pathways and processes, such as apoptosis and differentiation (Verhamme et al. 2019; Bond 2019). Body fluids are strongly affected by this activity, since around half of all proteases are secreted and exert their activity in the extracellular space. Typical examples of protease activities in body fluids are the coagulation cascade and the complement activation system. Proteases are also involved in disease and altered levels of proteolytic activity can indicate or cause a pathological state. Degradomics is the field of research investigating cleavage events in complex samples or systems (Savickas and auf dem Keller 2017; Grozdanić et al. 2019).

Proteolytic cleavage can result in complete degradation or limited processing of a protein substrate. Degraded, unstable proteins are removed from the system and changes in their abundances are indicative for a degradative proteolytic process. However, discovery and analysis of limited proteolytic events by MS-based proteomics requires identification of newly generated protein products. Therefore, newly formed protein N-termini and C-termini are ideal indicators of limited proteolysis (Eckhard et al. 2016). Still, despite high proteolytic activities in body fluids, terminal peptides represent a small portion of the overall peptide content. Since bottom-up proteomics workflows include proteome digestion using trypsin or another suitable endoproteinase, internal protein peptides heavily outnumber terminal peptides in MS-based proteomics experiments, making their detection very difficult. Therefore, several enrichment strategies have been developed to overcome this issue. Because primary amines are more reactive than carboxyl groups, N-terminal enrichment strategies are more prevalent in degradomic studies. Positive enrichment methods selectively enrich for protein termini, while negative enrichment methods aim at depleting internal protein peptides.

8.4.1 Positive Enrichment

Positive enrichment strategies selectively label N-terminal α-amines with chemical affinity tags at the protein level. After digestion, tagged N-terminal peptides can be enriched by affinity purification. Timmer et al. implemented a positive enrichment strategy based on selective guanidation of lysine side chain ε-amines, subsequent protein N-terminal labeling with an amine-reactive biotin tag and streptavidin affinity enrichment of N-terminal peptides after tryptic digest (Timmer et al. 2007). Another positive selection method is based on the enzyme subtiligase, an engineered variant of the protease subtilisin. Subtiligase is able to catalyze the ligation reaction between proteins or peptides. In this method, a biotin-conjugated peptide is ligated selectively to free N-terminal protein α-amines. After digestion, the biotinylated N-termini are affinity purified using streptavidin columns. Finally, the terminal peptides are released by cleavage using tobacco etch virus protease with very distinct specificity for a sequence present in the biotin-labeled peptide tag (Yoshihara et al. 2008). Using this strategy, Wildes et al. recorded the first N-terminome of human blood (Wildes and Wells 2010) and Wiita et al. monitored circulating peptides released from tumors (Wiita et al. 2014). Positive selection methods have proven to produce simplified proteomes for degradomic studies. However, the accurate discrimination between ε- and α-amines is quite difficult, and positively enriched samples do not include natural N-terminal post-translational modifications such as acetylation and cyclization.

8.4.2 Negative Enrichment

Several negative enrichment workflows for protein termini from complex biological samples have been developed and extensively described in the recent literature (Luo et al. 2019; Savickas et al. 2020). Here, we will summarize the principles of the two methods that have been most widely applied in the field of degradomics and body fluid analysis.

8.4.2.1 Combined Fractional Diagonal Chromatography

Combined fractional diagonal chromatography (COFRADIC) is a negative enrichment method that employs two chromatographic techniques for purification of terminal peptides (Staes et al. 2011, 2017). At first, the primary amines are acetylated, the sample is digested, and treated for removal of N-terminal pyroglutamates. Secondly, strong cation exchange chromatography is used for N- and C-terminal peptide enrichment. The more positive tryptic peptides bind to the resin at low pH conditions, leaving the terminal peptides available for collection in the flow-through. The purified peptides are treated with 2,4,6-trinitrobenzenosulfonic acid (TNBS), which increases the hydrophobic properties of C-terminal and internal peptides. Finally, N-termini are recovered by a series of reverse phase liquid chromatography steps. Among many other applications, COFRADIC has been used to discover novel plasma biomarkers for heart failure (Mebazaa et al. 2012).

8.4.2.2 Terminal Amine Isotopic Labeling of Substrates

Terminal amine isotopic labeling of substrates (TAILS) is another method for negative selection of N-termini, which can be multiplexed with the use of TMT/iTRAQ labels (Kleifeld and Doucet 2010; Kleifeld et al. 2011). In this technique, all primary amines are blocked by amine-reactive chemical tags. Following digestion, the peptides are incubated with a high-molecular weight (>100 kDa) aldehyde-derivatized polymer (HPG-ALD), which specifically binds the free primary amines of the internal protein peptides. The polymer bound peptides and the N-terminal peptides are separated by ultrafiltration, keeping the polymer in the retentate and leaving the filtrate enriched with the protein N-termini. Labeling of N-termini with TMT/iTRAQ enables highly multiplexed relative quantification of N-termini and identification of protease cleavage events, which makes TAILS a powerful technique for studying proteolytic landscapes in complex biological matrices. In body fluid degradomics, TAILS has been for example used to characterize proteolysis in human platelets (Prudova et al. 2014) and wound exudates from pigs and patients (Sabino et al. 2015, 2018; Sabino and Hermes 2017).

8.5 Conclusions

Body fluids are important specimens for biomarker discovery, and we have just started to exploit their potential in diagnostics and treatment of disease. While being readily accessible for sampling, their proteomes are highly complex, posing many challenges to their comprehensive analysis. Rapid advancements in MS-based proteomics have helped to overcome many of these issues by development of powerful workflows to reliably assess even low-abundance biomarker candidates with high quantitative accuracy (Table 8.1). This is enabled by combining state-of-the-art instrumentation with advanced sample preparation and approaches to specifically enrich for peptides resulting from post-translational modifications. In particular, proteolysis generates terminal peptides in body fluids that by applying customized degradomics workflows open up an even richer sample space to be explored for devising novel strategies for diagnostics and therapeutic intervention in personalized medicine.

Table 8.1 Overview of original research studies using MS-based proteomics and advanced sample preparation for proteome analysis of body fluids