Keywords

Introduction

The importance of having a reliable biomarker, or a set of reliable biomarkers, for the diagnosis of specific diseases should be evident, especially when it can detect early stages of the disease. Additionally, biomarkers could also be applied to predict the outcome of a disease or to monitor and guide the therapy. Regarding the chemical composition of the biomarker(s) , all kinds of biomolecules could serve as a valuable marker: this can be proteins, peptides, lipids, metabolites, and nucleic acids. As the various ‘-omics’ strategies, including proteomics, peptidomics, lipidomics, and metabolomics, are essentially based on various mass spectrometry methods, mass spectrometry (MS) has become the key analytical tool for biomarker discovery.

An important issue in the biomarker discovery research is that the most relevant modifications are found at the site of the affected tissue itself where these molecules are obviously present at their highest concentration. As a result, the analysis of biopsies is still the best starting point to find relevant molecular modifications. However, the ultimate goal behind biomarker discovery research is to identify relevant biomolecules in easily obtained patient samples (particularly blood, urine, feces). This automatically implies that these molecules should be released from the affected site and should be transportable (water-soluble or attached to a transport protein). The amount released should be appropriate to deal with the substantial dilution (e.g., dilution in 5 L blood) in the final sample. There is an increasing evidence that a panel of biomarkers, instead of a single biomarker, dramatically improves the quality in terms of selectivity and specificity. This requires advanced data processing for accurate weighing of each potential biomarker (see further). Obviously, all these considerations are not only valuable in the field of endometriosis but apply to the whole field of biomarker discovery. In following chapters, we will focus on common proteomics strategies and studies described in endometriosis.

Proteomics Strategies for Biomarker Discovery

Various strategies have been applied and are applied to screen for differences in the protein content related to a specific disease or a specific disease state. Gel electrophoresis is a common technique to analyze the protein content of biological samples, but has only a very limited resolving power and therefore insufficient to analyze complex protein mixtures. This does not apply to 2D gel electrophoresis where two dimensions are used (separation based on the isoelectric point combined with separation based on the molecular weight). 2D gel electrophoresis can separate up to several 1000s of protein spots and is probably one of the oldest methods applied in high-throughput proteomics. The multiplexed variant (2D DIGE), in which multiple samples (up to three) are combined, allows parallel analysis of different samples and has been proven to be very effective and very popular. Although this approach has several drawbacks, it has several strong merits including simplified data analysis, and it is still used (see further). The idea to separate proteins using two dimensions has also been proposed in a chromatographic format using separation based on isoelectric point (isoelectric focusing LC) and on molecular weight (Beckman PF 2D), thereby copying exactly the strategy used in 2D gel electrophoresis .

Mass Spectrometry Is the Key Analytical Tool in Biomarker Discovery

In the strategies mentioned above, mass spectrometry only plays a role in the identification of the detected differences in protein content. However, in most currently applied strategies, mass spectrometry itself is used to detect these differences. This differential analysis is performed by comparison of the signal intensities of the measured ions in mass spectrometry and is used as a parameter to evaluate the abundance. These signals can come from intact proteins (such as used in SELDI-TOF MS; see further) or—more commonly—from labeled or unlabeled peptides generated from these proteins.

As sample complexity severely affects the outcome of the MS analysis, an impressive collection of methods have been proposed (e.g., multiple liquid chromatography (LC) steps) developed to reduce this sample complexity. Separation strategies using multiple dimensions improve the resolution, sensitivity, and overall quality of the obtained data, but induce less straightforward data processing. The first dimension could be anything from an affinity step (e.g., immunoprecipitation, affinity beads) to solid-phase extraction, physical separation (centrifugation, ultrafiltration, etc.), ion-exchange chromatography, reversed-phase chromatography at high pH, or even gel slices from a 1D gel electrophoresis. The final dimension typically separates the peptide mixture by reversed-phase chromatography at low pH. An example of an early 2D LC-MS-based proteomics strategy is the original multidimensional protein identification technology (MudPIT) method (see further), combining an initial digestion of the sample and a separation by ion-exchange and reversed-phase chromatography.

Labeled and Unlabeled Methods

When in proteomics the focus is set on biomarker discovery, it is required not only to identify but also to quantify proteins in different samples in order to obtain a more detailed picture of the differences between various conditions, e.g., healthy versus disease or mutant versus wild type. This quantitative proteomics can be obtained by comparing samples which have been labeled (labeled methods) with dedicated tags. Labeling can be performed at the protein level or at the peptide level (after digestion of the protein). Each method has its own benefits and constraints. Labeling methods allow both relative and absolute quantification. The labeling can be performed at the protein level or at the peptide level. Labeling at the protein level offers the advantage of easy interpretation and could establish a dramatic reduction in sample complexity (in the best case: one peptide per protein). However, labeling at the peptide level offers the ability to use various peptides from the same protein for proper quantification. Also the risk of inconsistencies induced by missing a single-labeled site is reduced. It is also possible to obtain quantitative information from unlabeled samples by processing the mass spectral signal intensities (comparison of the relative ion intensities). The most commonly used methods will be described.

Labeled Methods

2D DIGE

The two-dimensional difference gel electrophoresis (2D DIGE) allows a parallel separation of proteins from up to three different batches. Separation of the proteins is based on their isoelectric point (first dimension) and on their molecular weight (second dimension). This technique starts by the labeling of the protein mixtures with one of the three available fluorescent CyDyes (Cy2, Cy3, or Cy5). These labels bind to lysine side chains (the “minimal labeling” method) or, alternatively, to cysteines (the “saturation labeling” method). Up to three samples, each with a different fluorescent color staining, can be mixed and loaded into a single 2D gel. This approach involving the simultaneous analysis of multiple samples is known as multiplexing and is a general advantage of labeled methods . Scanning of the gel delivers a picture of “gel spots” with diverse locations. An internal reference is constituted of an equal mix of all the processed samples. This internal reference sample therefore contains all possible spot positions of the individual samples. This facilitates the interpretation of closely migrating gel spots. Moreover, matching of the same reference sample in different gels creates an intrinsic link between these different gel runs. Matching and quantitative analysis of the spots from scanned gel images is performed by specialized software. Here, an impressive collection of software is available, including Melanie (GeneBio, Geneva, Switzerland), DeCyder 2D or ImageMaster 2D (GE Healthcare, Chalfont St. Giles, UK), PharosFX System, PDQuest 2D (Bio-Rad, Hercules, CA), Dymension (Syngene, Cambridge, UK), Progenesis SameSpots, and Delta2D. Following the differential analysis, the identification of the content of each gel spot is based on an in-gel digestion and subsequent analysis in MS.

Isotope-Coded Affinity Tags (ICAT)

This is one of the first tagged methods developed for quantitative mass spectrometry [1, 2]. The original tags exist in two forms, heavy and light, and react specifically with free cysteine residues. The tags have exactly the same chemical composition but differ in mass because of the presence of eightfold deuterated (heavy tag) or non-deuterated linker groups (light tag). Labeling can be performed both at the protein level and on the peptide level. Two samples, each with a different ICAT tag, are mixed to generate a multiplexed analysis. The tags also contain biotin, which allows easy separation of the tagged cysteine containing peptides by (strepta) avidin beads. ICAT offers the advantage that after digestion, only the peptide (or peptides) with the specific label is required for the quantification. This can strongly simplify the MS analysis and subsequent data processing.

As stable isotope labels should in principle affect only the mass, the biophysical and chemical properties of peptides and proteins should not be affected. Therefore, the heavy and light peptides co-elute from the LC column at the same retention time. The heavy stable isotope leads to a mass shift in the mass spectrum. The presence of both heavy and light tags results in the appearance of peak pairs, which can be compared to calculate the difference in abundance between both samples.

As the number of cysteine residues in proteins is restricted, a huge reduction in complexity of the sample can be obtained. Obvious disadvantages are that the labeling efficiency is not always optimal and that some proteins (about 10%) even do not contain cysteine residues. Additionally, the biotin tag is not small and increases the complexity of fragmentation spectra, making peptide identification more tricky. Moreover, the deuterium atoms that are associated with the tag may lead to a shift between the light and heavy peptides in reversed-phase chromatography [3]. The method has been improved by the substitution of a cleavable and co-eluting tag [4, 5].

Isotope-Coded Protein Labeling (ICPL)

This method uses similar principle (isotope-coded tags with the same chemical composition) as ICAT, but now free lysine side chains and free N-termini are labeled. Because there are significantly more free amino groups available than free cysteine residues, the level of labeling is increased significantly. ICPL allows the simultaneous comparison of up to four experimental conditions in a single experiment [6]. Labeling can be performed at both the protein level (before digestion) and the peptide level (after digestion).

Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC)

An interesting protein labeling method involves the manipulation of the culture medium to ensure that newly synthesized proteins are carrying an isotopic label. As the stable isotopes are incorporated into metabolic products (proteins), this approach is known as in vivo labeling or metabolic labeling. Application of this method to allow quantitative proteomics was originally reported by Oda et al. [7] in growing yeast cultures, demonstrating the inclusion of15N atoms in all amino acids by adding15N-labeled ammonium persulfate as the only nitrogen source in the culture medium. The method was further developed in 2002 by the lab of Matthias Mann [8], to create a stable isotope labeling by adding amino acids in cell culture (SILAC). In stable isotope labeling by amino acids in cell culture (SILAC), cell cultures are incubated with essential amino acids (lysine, arginine) containing heavy stable isotopes. During cell growth, those amino acids will be integrated into proteins, resulting in the integration of the labeled amino acids in the whole cell proteome.

The labeling of lysine and arginine is highly interesting because trypsin, the predominant enzyme used for protein digestion in MS analysis, cleaves at the C-terminus of lysine and arginine. Therefore, in a SILAC experiment, all tryptic peptides, with exception of the C-terminal peptide, have at least one labeled amino acid. When analyzed, this will result in a shift in the masses of the digested peptides. When the labeled samples are mixed together with the non-labeled samples, peptides will be represented by peak pairs. The mass difference between those peaks is dependent on the number and nature of the labeled amino acids. More recently, SILAC has been applied in global proteome studies [9], for functional proteomics assays, as well as for the study of post-translational modifications [10, 11]. SILAC is currently the most common approach for in vivo isotopic labeling, but is considered as an expensive and time-consuming method, with an efficiency that was reaching only 70% in plants [12], which is not sufficient in many other proteomic studies. Moreover, it is not always suitable, in terms of use and ethics, to label the tissues in a living organism, meaning that the development of alternative chemical and enzymatic methods is also useful .

TMT and iTRAQ Isobaric Labeling

The isobaric tag for relative and absolute quantitation (iTRAQ) [13] and tandem mass tag (TMT) [14] technologies have been developed as an alternative to standard isotope-coded labeling especially to enhance the degree of multiplexing. Unlike isotopic tags, isobaric tags not only have identical chemical properties but also identical masses, resulting in perfect co-elution of heavy and light tagged peptides [15, 16]. Both TMT and iTRAQ labeling are commonly performed at the peptide level and create a covalent labeling of the N-terminus and side chain amines of peptides. The labeled peptides produce only a single peak during liquid chromatography, even when two or more samples are mixed. After fragmentation of the labeled peptide by collision-induced dissociation (CID) , the specific mass tag becomes visible as one of the fragments. Therefore, this type of quantitative proteomic analysis essentially requires MS/MS. Isobaric labeling allows superior multiplexing (four, six, or even eight labels). Isobaric mass tagging has also been adapted for use with protein labeling (similar to ICPL) .

Chemical Labeling

All kinds of custom chemical labeling have been described. The label is introduced into proteins or peptides by a chemical reaction, for instance, with amine groups or sulfhydryl groups. Esterification or acetylation of amino acid residues also has been applied, as well as dimethylation of the primary amines of digested peptides with isotopomeric dimethyl labels.

Enzymatic Labeling

Another labeling method involves the creation of newly formed C-termini upon trypsin digestion [17]. By digestion in heavy water (H2 18O), the new C-termini will carry the heavy18O label. This method allows the comparison of two conditions in parallel (normal versus heavy C-terminus) and is cheap. Unfortunately, the label is not stable and can be lost by incubation in normal water.

Label-Free Methods

The label-free methods do not use any labeling step and are therefore very attractive because of their simplicity. In addition, problems related to incomplete labeling are also avoided. However, the data processing of whole proteomic datasets is much less straightforward, and usually the threshold to identify differences is higher than what is obtained with labeling-based methods. Therefore, if the focus is set on only tiny differences (e.g., less than 35%) in protein concentration, labeling-based methods are definitely preferred. The ease of use and the low cost compared to other quantitative proteomic approaches have established the label-free quantification strategies as the most popular methods in large-scale sample experiments such as clinical screenings or biomarker discovery experiments.

Unlike other quantitation methods, label-free samples are not multiplexed. Each sample is analyzed separately. Therefore, label-free quantitation experiments need to be more carefully controlled than stable isotope methods to account for any experimental variations. Protein quantitation is performed using either ion peak intensity or spectral counting.

SELDI-TOF MS

The surface-enhanced laser desorption ionization (SELDI) time-of-flight mass spectrometry has been applied in the past in various label-free proteomics studies. This method is rather unique as the differential analysis is performed by comparing the signal intensities from proteins and not from peptides. This technology uses special matrix-assisted laser desorption ionization (MALDI) target plates, so-called ProteinChip Arrays , which have spots with particular chromatographic surfaces (hydrophobic, cationic, anionic, metal ion presenting, or hydrophilic), allowing an on-chip purification of the sample. Also pre-activated ProteinChip Arrays are available for the coupling of diverse capture molecules (proteins such as antibodies or receptors, DNA, or RNA) prior to sample loading. The technology was originally produced by Ciphergen Biosystems Inc. (Fremont, CA, USA), later hosted by Bio-Rad, but currently this technology is no longer available, because of various limitations .

Multidimensional Protein Identification Technology (MudPIT)

This method involves a digestion of the sample and subsequent analysis by a multidimensional liquid chromatography (more than one LC)-MS setup. Multidimensional protein identification technology (MudPIT) was originally described in 2001 by the group of Yates [18], with a first chromatography dimension consisting of a strong cation exchanger (SCX) , and the second dimension consists of a reversed-phase chromatography. This online two-dimensional high-performance liquid chromatography (HPLC) can separate well-complex peptides, and the output of the second liquid chromatography (LC) is directly connected to the mass spectrometer. Recent method developments in peptide separation are using alternative separation strategies to SCX to improve peak separation and hence increase peptide identifications for MudPIT. A promising method is the use of “high pH-reversed-phase” separation as the first dimension. The use of this method increases peptide identifications by a factor of two when compared to similar MudPIT runs.

The use of “virtual 2D mapping ,” with the elution time from the column in one axis and the measured MS ions in the other axis, has been proven to be very powerful in differential analysis and quantification of the obtained results. The developed software tools (e.g., DeCyder MS, Progenesis, etc.) could build upon the large expertise generated from the data processing of 2D gels.

Current mass spectrometers demonstrate a huge improvement in resolution, accuracy, and speed, and some of them offer an additional separation such as ion mobility. Together with the recent developments at the LC level (nanoLC using ultra-performance liquid chromatography (UPLC) or ultrahigh-performance liquid chromatography (UHPLC) , “chip-based” microfluidic systems , etc.), it should be clear that various new LC-MS or LC-LC-MS workflows are under investigation .

Liquid Chromatography Coupled to Fourier Transform Mass Spectrometry (FTMS)

Fourier transform mass spectrometry (FTMS) using Fourier transform ion cyclotron resonance (FTICR) or using an Orbitrap analyzer outperforms any other commonly used mass spectrometry setup in terms of resolution (separation power) and accuracy. The Orbitrap-based MS instruments are currently recognized as the standard for accurate mass and high-resolution measurements, and the Orbitrap Q Exactive combines superior dynamic range and unsurpassed sensitivity with the high-performance quadrupole precursor selection and the high-resolution, accurate-mass Orbitrap detection to deliver high performance and tremendous versatility. An Orbitrap Q Exactive mass spectrometer linked to a nanoflow liquid chromatography (nanoLC) represents a platform that not only can offer broad screening capabilities but also excels at targeted quantitation of molecules of interest (candidate biomarkers) .

Proteomics in Endometriosis

Proteomics is the large-scale study of proteins, their expression, localization, functions, post-translational modification , and interactions [19]. Proteomics allows the simultaneous observation of alterations in protein expression which may be either a precursor to or causative in disease development or consequence of the disease [20]. Endometriosis researchers found differentially expressed protein/peptides between women with and without endometriosis in blood and urine but also in eutopic and ectopic endometrium [21, 22]. However, there is a general lack of studies that focus on the validation of biomarkers which to date still no biomarker or panel of biomarkers is sufficiently validated for clinical use [22].

SELDI-TOF MS platform has been used in endometriosis. Both eutopic endometrial specimens from women with and without endometriosis [23, 24] and blood samples have been used [22]. Briefly, SELDI-TOF MS provides differential proteomic profiles in the form of mass/charge (m/z) peaks without identification of the peptides or proteins, rather a fingerprinting. Kyama and coworkers were the first to use SELDI-TOF MS for endometriosis research and found reduced expression of a protein peak in secretory-phase endometrium from women with mild endometriosis relative to controls [25]. The same group found 32 peptide peaks differentially expressed in secretory-phase endometrium from women with endometriosis (n = 10) compared to controls (n = 6) [26]. Other research groups found five differentially expressed peptide peaks (5.385 m/z, 5.425 m/z, 5.891 m/z, 6.448 m/z, and 6.898 m/z) that collectively showed 91.7% sensitivity and 90% specificity in the diagnosis of endometriosis [24]. A panel of three differentially expressed peptide peaks (16.069 m/z, 15.334 m/z, and 15.128 m/z) diagnosed endometriosis with 87.5% sensitivity and 86.2% specificity [27].

In an exploratory study, a panel of four mass peaks (two upregulated, 90.675 kd and 35.956 kd, and two downregulated, 1.9 kd and 2.5 kd) allowed the identification of endometriosis with maximal sensitivity (100%) and specificity (100%) [28, 29]. The 90.675 kd and 35.956 kd mass peaks were identified as T-plastin and annexin V proteins, respectively [28, 29]. Annexin has a role in proliferation and/or cell mobility, has metastatic potential, and may promote the pathogenesis of endometriosis by stimulating early invasion of endometrial cells into the mesothelium after initial attachment to the peritoneal T-plastin plays a role in cellular motility, formation of the actin bundles required for cell locomotion, and maintenance of the cellular architecture [29]. The same group described a panel of differentially expressed peptide peaks (2072 m/z, 2973 m/z, 3623 m/z, 3680 m/z, and 21,133 m/z) in the early secretory endometrial proteome of women with versus without endometriosis as diagnostic of endometriosis with 91% sensitivity and 80% specificity [23].

In peripheral blood, SELDI-TOF MS and MALDI-TOF MS investigations have also shown differentially expressed protein and peptides in women and without endometriosis [30,31,32,33,34,35,36,37,38,39,40]. The largest study made an effort to identify the protein /peptide peaks with altered levels after analysis of 254 plasma samples from women with (n = 165) and without (n = 89) endometriosis [30]. Ultrasonography-negative endometriosis was best predicted (sensitivity 88%, specificity 84%) using a model based on five protein/peptide peaks (2.058 m/z, 2456 m/z, 3.883 m/z, 14.694 m/z, and 42.065 m/z) in plasma samples obtained during the menstrual phase [29, 30]. 2189 m/z was identified as fibrinogen beta-chain and was decreased in moderate-severe women of endometriosis. Fibrinogen beta-chain has been patent for endometriosis; this group Fazleabas found decreased levels in uterine flushing of baboons with induced endometriosis. A proteomic fingerprint model (126 endometriosis patients and 120 healthy controls), based on three peptide peaks , had 91.4% sensitivity and 95% specificity to detect endometriosis [38]. These results were validated in an independent cohort, showing a sensitivity of 89.3% and a specificity of 90% [38]. In a study by Dutta et al., using 2DE and 2D DIGE followed by MALDI analysis, 25 serum proteins were found to be differentially expressed between women with endometriosis and healthy subjects [40].

Hwang et al. used 2DE followed by MS and showed six differentially expressed plasma proteins between plasma pools of women with (n = 15) and without (n = 15) endometriosis [41]. Only haptoglobin was identified as potential biomarker using Western blotting on a subset of the individual samples [41].

Recently, one research group has reported 36 differentially expressed peptides in urine samples of women with endometriosis (n = 60) compared to women without endometriosis (n = 62) detected by MALDI-TOF MS. Using ClinProTools software, they generated an algorithm with a combination of five peptide peaks (m/z = 1433.9, 1599.4, 2085.6, 6798.0, 3217.2) [42]. Only one other group has identified six differentially expressed protein/peptides in urine of women with and without endometriosis [43]. The results were comparable between El Kasti group [43] and Wang et al. group [42]; however, both were not able to identify the protein/peptides.

Proteomics does not only imply protein /peptide differentiation but also post-translational modification . Post-translational modification occurring within cells is mainly responsible for the discrepancies noted between the genome and the expressed proteome. Currently, ~300 different types of PTM are responsible for the huge repertoire of protein origination from a small number of genes [44]. A study investigating the endometrial phosphoproteome of women with (n = 4) and without (n = 4) endometriosis showed that 516 proteins were modified at phosphorylation level during endometriosis [45]. Recent evidences have emerged that endometriosis may be an epigenetic disease [46]. Epigenetics refers to functionally relevant modifications to the genome that do not involve a change in the nucleotide sequence; this process is involved in development, homeostasis, disease, and aging and is responsible for X chromosome inactivation and genomic imprinting [46]. Histone proteins are located in the central part of chromatin, where they provide binding sites for covalent modification at their N-terminus [46]. Histone-modifying enzymes, such as HATs, HDAC, and HMTs, could affect structure of nucleosome and chromatin through modifying histone proteins posttranscriptionally, which in turn regulates gene expression pattern [46].

Future Trends

Endometriosis represents a significant global health burden, and proteomic approaches offer one avenue to discover new molecules allowing more sensitive and specific detection or diagnostic strategies [47]. To date, none of the differentially expressed protein/peptide peaks have been validated in an independent study cohort (blinded method as to patients’ disease status). Standardization is essential to overcome any pitfalls in the study design and methodology such as small sample size, lack of relevant clinical information, inconsistency in sample handling and storage, and technical control of pre-analytical sample variability [29]. The right documentations of the type of samples and highly standardized techniques for collection, processing, and storage are very important [47,48,49]. The depletion method is a crucial item in the design of future studies [30] to decrease the complexity of highly abundant proteins.

Many problems remain to be resolved, and while some of these are technical in nature, the most intractable ones have mainly to do with the complex and multifactorial character of the disease itself [20]. The analysis of differential protein expression in such complex biological samples requires strategies for rapid, highly reproducible, accurate, and robust protein quantitation [47] preferentially using Fourier transform mass spectrometry (FTMS) .