1 Introduction

Proteomics, a molecular biology discipline, studies living organisms from the perspective of their proteins, the biomolecules responsible for executing the genetic information coded in the genes, aiming at deciphering and interpreting their life cycle, dynamics, and interactions, and, lately, genotype to phenotype translation.

From a methodological point of view, it comprises in vitro techniques and, to a much lesser extent, either in vivo or in situ approaches. As an in vitro technique, molecules, in this case proteins, are obtained directly (by extraction) or indirectly (e.g. by in vitro translation) from biological sources for ulterior characterization at the physico-chemical and biological level, and are also employed for translational purposes (e.g. for food traceability analysis). As an –omics approach, in the holistic (from the Greek holos, meaning entire or all) sense of the term, and differently from classical biochemistry, it investigates proteins as a whole rather than as individual entities, without discarding its use as a targeted, hypothesis-driven, and “proteinomics” strategy (Picotti et al. 2013).

As an adaptation of the “genome” term, M. Wilkins introduced the word “proteome” for the first time. That was in 1994, at the first “Genome to Proteome” Siena meeting (2D Electrophoresis–From Protein Maps to Genomes, Siena, Italy, September 5–7, 1994). Later, it appeared and was formally defined as “the PROTEin complement of a genOME” in a paper published in Electrophoresis by 1995 (Humphery-Smith 2015). Since then, a new vision and era have arrived on the biochemistry and molecular biology/scene, transforming the classical protein chemistry or biochemistry into a holistic approach that opens up new possibilities to understanding the function of genes and the genotype to phenotype translation by tracking the total protein content of the cell.

As scientific disciplines grow in parallel, hand in hand, with the developments in technology, it is worth mentioning the three main advances that have contributed most to the birth of proteomics. First, it was the introduction, during the late 1980s, of soft ionization methods that allowed the analysis of peptides and proteins by mass spectrometry: the MALDI (matrix-assisted laser desorption/ionization) and the ESI (electrospray ionization) (Aebersold 2003; Aebersold and Mann 2003). Second, an increasing number of genomes were sequenced and DNA or EST sequences were made available thanks to progress in NGS (next generation sequencing) technologies (Buermans and den Dunnen 2014). And third, bioinformatics tools and algorithms were developed to identify and quantify proteins from MS spectra and to manage the statistical analysis of the huge amount of data generated (Baldwin 2004; Schubert et al. 2017). In addition, proteomics is based on classical protein biochemistry and cell biology methods including protocols for protein extraction, fractionation, purification, depletion, and labeling, in which electrophoresis has played a pivotal role, giving rise to one of the platforms most employed in plant research, i.e. two-dimensional gel electrophoresis, (2-DE), the focus of this chapter (Gorg et al. 2004).

Proteomics can be defined as being a scientific discipline or methodological approach, whose objective is the study of the living organism proteome, understood as the total set of protein speciesFootnote 1 present in a biological entity (subcellular fraction, cell, tissue, organ, organisms, population, ecosystem) at a certain time (specific growth and developmental stage), and under specific environmental conditions. It can also refer to a structural or functional group of proteins (proteases, phosphoproteome, membrane proteins, etc.). This definition emphasizes the dynamic character of the proteome that, together with the chemical complexity of the proteins, the number of protein species coded by individual genes, and the different concentration range within the cell, makes the approach quite challenging. By using proteomics we aim to find out “how”, “where”, “when”, and “what for” are the several hundred thousand of individual protein species produced in a living organism. We wish to know how they interact with one another and with other molecules to construct the cellular building, and how they work in order to fit in with programmed growth and development, and to interact with their biotic and abiotic environment (Jorrín Novo 2015).

The objectives of proteomics research will define different areas within the field, including the simple identification and cataloguing of the protein species at the whole cell, tissue, organ or sub-cellular levels (descriptive and sub-cellular proteomics), the qualitative and quantitative comparison of two or more biological samples in order to infer differences and biological interpretations of the variations among genotypes, organs, tissues, developmental stages, and environmental conditions (comparative proteomics), the identification and characterization of post-translational variants of a protein (post-translational proteomics), and the molecular interaction with other proteins or biomolecules (interactomics). Proteomics can be used for basic (gaining of biological knowledge) or translational purposes (Cox et al. 2011). This chapter is mostly focused on the first two premises, descriptive and comparative proteomics.

2 Proteomics in Plant Biology Research

Proteomics has become a priority in biological investigation, and plants are not an exception to this rule; together with other –omic approaches it is at the heart of Systems Biology. The relevance of the discipline can be deduced by considering the number of papers published since 1994 (Fig. 19.1), when the term proteome was coined, and when the first two papers on plant proteomics appeared (Egorov et al. 1994; Klabunde et al. 1994). The first works reporting a global plant proteome analysis date back to 1999 (Kehr et al. 1999; Peltier et al. 2000) and the first comparative proteomics 1–2000 (Chang et al. 2000; Natera et al. 2000). Since then, and up to 2017, the total number of citations listed in PubMed under the words “plant proteomics” was of 8011, representing 10% of the total items that came up when just “proteomics” was searched. For comparative purposes, the total number of plant transcriptomics items was of 11,776, and that of plant metabolomics 4113. The topic of plant proteomics has been extensively reviewed since 1999 (Thiellement et al. 1999), with some of them authored or co-authored by Prof. Jorrin-Novo (Jorrin-Novo et al. 2006, 2007, 2009; Jorrin-Novo and Valledor 2013; Jorrin-Novo 2014, 2015; Komatsu and Jorrin-Novo 2016; Sanchez Lucas et al. 2016).

Fig. 19.1
figure 1

Number of references reported at PubMed database during the 1994–2017 period when a search was performed with the words (all fields): proteomics, plant + proteomics, plant + transcriptomics, and plant + metabolomics

Most of the original publications belonged to descriptive proteomics, including sub-cellular, and comparative categories, with two-dimensional gel electrophoresis-based proteomics being the dominant approach, although, for the last 5 years, label and gel-free label-free (shotgun) approaches have become dominant.

Proteomics papers have been published on close to one hundred plant species, including model systems (Arabidopsis thaliana, Medicago truncatula, Lotus japonicus), crops, including cereals, legumes, oil-bearing, vegetables, fruit and berries, sugar, and permanent, aromatics, weeds, and forest trees (reviewed in Jorrin Novo et al. 2015). As confident protein identification from mass spectra is only possible if the genome is sequenced, or there are enough well-annotated sequences available, proteomics with orphan species is highly challenging, as putative identified proteins corresponded, in the best of the cases, to orthologs rather than to gene products (Abril et al. 2011).

Proteomics experiments have been carried out with seeds, seedlings, and adult plants at the vegetative and flowering stages, either at the whole individual or organ, tissue, or cell level, including roots, hypocotyls, cotyledons, shoots, stems, leaves, buds, meristems, flowers, spikes, fruits, callus, cell suspensions, protoplasts, hairy roots, or somatic embryos (Jorrin Novo et al. 2015). The use of the different plant material is conditioned by the objectives of the research, but from a proteomics point of view the complexity and chemical composition determines, to a great extent, the final results in terms of the number of proteins that can be confidently identified and quantified. Plant organs comprise different type of tissues and cells, each one with its own protein signature, thus causing high biological variability. The presence of non-proteinaceous compounds in the tissue affects the amount and number of proteins extracted and solubilized prior resolution and mass spectrometry analysis. This is the case of salts, as for example in root tissues, polysaccharides, like in cereal seeds and fruits, lipids, in seed of oily plants, phenolics, in fruit and flowers proteases, in some fruit such as pineapple.

The protocol to be used for protein extraction, solubilization and resolution will depend on the chemical composition of the plant material, and the best one will capture the most protein species without being modified, eliminating, at the same time, non-protein compounds. Another important issue is the presence of major proteins that, like RubiSCO in leaves and reserve proteins in seeds, mask the visualization of minor proteins.

Proteomics has been used in plant studies for both basic research and translational purposes. In Table 19.1, a list of research objectives is summarized based on a search at PubMed on February 2nd, 2018. Plant development and responses to stresses are by far the topics most represented in the current literature.

Table 19.1 Basic and translational plant research objectives approached by using proteomics, as number of items references in PubMed. The list did not purport to be exhaustive

3 Plant Proteomics Methods, Techniques and Protocols

In this section, the platforms employed in proteomics research will be mentioned and briefly discussed, with emphasis on 2-DE-MS, the one most used with plant species. It is not proposed to give many details or detailed protocols, but just a few guidelines that will help to approach a plant project using proteomics, to decide which protocol to use and to evaluate the results. A more detailed discussion will be found in the original publications, reviews or monographs by the author’s group. Among them, Plant Proteomics Methods and Protocols, edited by Jorrin-Novo et al. (2014), is an excellent reference.

The workflow of a standard MS-based proteomics experiment includes the following steps, as illustrated in Fig. 19.2 for a 2-DE-based approach: experimental design, sampling material and storage, protein extraction, fractionation, purification, and/or depletion, protein electrophoresis (one- and two-dimensional), MS analysis, protein identification and quantification, and statistical analysis of the data. For each stage, protocols have to be adapted to the experimental system and the objectives of the research (Jorrin-Novo et al. 2014).

Fig. 19.2
figure 2

Steps in a standard 2-DE based proteomics experiment workflow

3.1 Experimental Design

Although not always realized, this preliminary step is a key one, not only for proteomics but also for whatever approach is used in an investigation. The experimental unit must be clearly defined as well as the tissue to be sampled and the sampling time. Another important decision to be made is the number of analytical and biological replicates to be performed, which depends on the technique itself and the analytical and biological variability found. All these issues are discussed in depth in Jorrin-Novo et al. (2009), Valledor and Jorrin (2011), and Valledor et al. (2014), with clear examples from our work with Holm oak (Quercus ilex) (Jorge et al. 2005, 2006). Special attention should be paid to the statistical analysis of the data if we wish to confidently conclude from a biological point of view. The proteome should be analyzed as a whole so that a multivariate analysis of the data has to be performed. This test shows how homogeneous the replicates are and how different the samples, and also which spots contribute most to the biological variability.

3.2 Protein Extraction

Once plant tissue is sampled, it should be cleaned and sterilized in order to avoid contaminant proteins in the sample. If the proteins are not being extracted immediately after sampling, which is quite common, the tissue must be stored ensuring that it is not modified, and to avoid possible artifacts. In our hands freezing in liquid nitrogen and storing at −80 °C, or even better, lyophilizing before storing, has provided good results.

It is a maximum that, to detect and identify a protein, it has to be extracted and solubilized, so the proteomics experiment depends to a great extent on the extraction protocol. Two general methods can be used for protein extraction from plant tissue, either based on solubilization in a buffer medium, or precipitation by using organic solvents and acids; however both protocols can be combined. In our hands the precipitation protocols have also given the best results in terms of protein yield and number of bands or spots resolved by 1- or 2-D electrophoresis. The choice of the precipitation procedure is justified because of the low protein content in plant cells, and the chemical composition of the plant tissue, as most problems related to protein solubilization and resolution are associated with the co-extraction of non-protein compounds, such as salts, polysaccharides, polyphenols, lipids, and the presence of proteases (Jorrin Novo et al. 2009). The artifacts generated by all these compounds are minimized in precipitation protocols. The protocol must be optimized in each experimental system, as has been reported, for example, in Maldonado et al. (2008) for Arabidopsis leaf tissue, and carnation stem (Ardila et al. 2014). Hydrophobic proteins, as well as those with extreme pIs, are usually elusive to most of the standard protocols and their study requires specific methods, whose discussion is outside the objectives of the present chapter. Once extracted and solubilized in 1- or 2-D electrophoresis medium (Gorg et al. 2000), the protein content must be quantified by using colorimetric assays such as Bradford, Lowry or Biciconinic (BCA). From these data the extraction protocol has to be validated by comparing experimental yield data with the total protein content of the plant system under analysis as determined by Kjeldahl or NIRS technology (Romero Rodriguez et al. 2014). It is sometimes observed that the protein yield is low, which is often disappointing, but this is quite common. Thus, for example, and by using a protein sequential extraction of Holm oak seeds, it was not possible to solubilize more than 15% of the total protein content. But, even so, the number of spots resolved in a 2-DE gel was high enough to provide relevant information to the system, with more than 400 spots visualized (Sghaier-Hammami et al. 2016).

The proteome is, by definition, of a great complexity, with the number of protein species being the result of the number of genes and the post-transcriptional and post-translational events that make the total number much higher than that of the genes or transcripts. In order to obtain a deep proteome coverage, subcellular fractionation or sample pre-fractionation by using chromatographic or preparative electrophoresis techniques are two valid strategies, whose discussion is outside the scope of this chapter (Martínez-Maqueda et al. 2013).

Low-abundant proteins are another important issue in proteomics where they are usually masked by major ones. To overcome this limitation, depletion techniques have been utilized, with the most common one implicating the use of antibodies against abundant proteins such as RubisCO (Cellar et al. 2008) or the equalizer (combinatorial peptide ligand library) technology (Boschetti et al. 2009).

3.3 Two-Dimensional Gel Electrophoresis

Electrophoresis (the separation of ions under the influence of an electric field) is undoubtedly about the most powerful preparative and analysis technique most employed in protein research. Its origin dates back to the late 1920s, to Arne Thiselius, considered to be the father of the technique, pioneering the moving-boundary method. Since then, continuous improvements of the technique and different variants and applications have been developed, including zone electrophoresis, polyacrylamide gel electrophoresis (PAGE), disc electrophoresis, denaturing gel electrophoresis (sodium dodecyl sulphate, SDS-PAGE), isoelectrofocusing (IEF), and two-dimensional gel electrophoresis (2-DE), being among the most relevant. The 2-DE, with isoelectrofocusing as first and SDS-PAGE as second dimensions, was first reported by O’Farrel, Scheele and Klose in 1975 (Vesterberg 1989). Up to 2012, 2-DE, including the Differential Gel Electrophoresis (DIGE; Unlu et al. 1997) and the bidimensional variant Blue Native (BN)-SDS PAGE (Eubel et al. 2005) have been the dominant, almost unique, platforms in plant proteomics research. In the last 5 years plant proteomics has been moving towards second (labeling) and third (shotgun) approaches (Jorrin Novo et al. 2009, 2015).

2-DE is a consolidated technique with not much room for improvement (Gorg et al. 2000, 2004), so we did not claim to enter into the discussion of the technique details, but just to insist on the message of the need to optimize it for each experimental system. Detailed protocols can be found in our original publications in which we have employed 2-DE/MS in the proteomics analysis of different organs from Holm oak seedlings and plants, including fruit, seed embryo, leaves, root, and pollen (Jorrin Novo and Navarro Cerrillo 2014). The aim of that work was to characterize and catalogue provenances and to study development, growth and responses to biotic and abiotic stresses.

As shown in Ardila et al. (2014), some parameters of a general 2-DE protocol have to be fixed and optimized for each experimental system in order to obtain the maximum protein visualization (sensitivity), and resolution. The crucial ones are the amount of protein loaded, the IEF-strip pH gradient and length and the staining protocols (the classical visible Coomassie or silver and the fluorescent dyes), each one having different sensitivities and dynamic ranges.

Like any other technique, 2-DE has its own particular characteristics. It is a powerful one that, depending on the experimental conditions and the biological system, allows the detection of from a few hundred to up to a couple of thousand individual spots, each one corresponding to one or more protein species if comigration occurs, something quite common. This artifact can be avoided or minimized by using narrow pH gradients and long IEF strips, resulting in an increase in resolution of similar or closely related proteins, including different translation products of the same gene, allelic variants or isoforms. It gives precise information on the protein Mr and pI that will help in its identification. One great advantage is its multiplexing ability, allowing the combination of general or specific staining protocols, and its use in western analysis, activity-based profiling and labeling techniques. On the other hand, it has some limitations such as low reproducibility that is solved with the DIGE protocol, and the difficulty in analyzing recalcitrant, hydrophobic and extreme pI, proteins. Finally, and unlike liquid chromatography, the competitor technique, automation is not possible. All these technical and analytical issues are discussed in some of the excellent monographs edited by the companies selling equipment and reagents and some of the reviews published by Prof. Rabilloud (e.g. Rabilloud 2014; Rabilloud and Lelong 2011).

2-DE is a quantitative technique, at least in relative, comparative, terms. It is based on spot intensity that depends on the protein abundance and the staining or labeling protocol. The difference between two samples may be qualitative (spot presence or absence) or quantitative (a more or less abundant or intense spot). Protein species abundance should not necessarily be related to the level of the corresponding gene expression. So the absence of a spot does not necessarily mean that the coding gene is not being translated. This could be because it is below the detection limit of the staining procedure or has suffered some post translational modification that resulted in a change of Mr and/or pI, thus moving to a different coordinate within the gel.

As the number of spots in a gel is very high, its analysis is performed after gel image capturing by using algorithms, some of them commercial and others free. It is a laborious and time consuming step not exempt from artifacts, as discussed in Berth et al. (2007). As indicated above, the data on protein abundance when two or more samples are compared and the significance of the differences must be subjected to uni- and multivariate statistical tests. It is recommended to be restrictive and conservative when deciding whether or not a spot is variable among samples. It should be consistent (always present or absent in all the biological and analytical replicates), its variability lower than the average biological variability of the whole sample, and the differences statistically significant (e.g. uni ANOVA test). Multivariate analyses, such as the principal component analysis (PCA) will show how homogeneous the replicates are, how different the samples, and which spots contribute most to the variability (Righetti et al. 2004; Valledor and Jorrin Novo 2011). Once the 2-DE gel has been analyzed and the quantitative data subjected to statistics, the next step is the identification of the spots, either the variable ones or the whole set, by using mass spectrometry and, in some cases, EDMAN N-terminal sequencing.

3.4 Protein Identification Through Mass Spectrometry

Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of electrically gas-phase particles (Calvete 2014). Mass spectrometry as an alternative or complementary approach to EDMAN sequencing appeared on the protein research scenario in the late 1980s, once soft ionization procedures, MALDI and ESI, had been developed. In a very simple scheme, a mass spectrometer contains three basic elements: the ionizer, the mass analyzer, and the ion detector. Different machines result from the combination of ionizers (MALDI or ESI) and analyzers (quadrupole, Q, ion trap, T, time of flight, TOF, Orbitrap), each one having its own characteristics and particularities (mass accuracy, resolution, sensitivity, dynamic range, speed, sequencing capabilities) that determine the number of peptides/proteins identified and quantified (Calvete 2014). The spectrometer may operate in single MS (m/z values for the ions, parental ions, proteins or peptides) or tandem MS or MS/MS modes (the parental ion is fragmented in the collision cell and the m/z values for the fragments determined) (Nesvizhskii et al. 2007). In most of the cases reported plant proteomics work is based on a bottom-up 2-DE MS strategy, in which the proteins (e.g. spots from a 2-DE gel) are subjected to digestion by trypsin and the tryptic fragments are directed towards MS analysis, most commonly by using the MALDI-TOF/TOF strategy. So, the protein data are inferred from the peptides that get the mass spectrometer.

The rationale of protein identification from mass (MS) or tandem mass (MS/MS or MSn) data is the comparison between the experimental data (m/z ratios) and the theoretical ones deduced from the protein/peptide sequence as found in protein databases. The correct assignment of that spectrum to a peptide sequence is a first and central step in proteomic data processing (Nesvizhskii et al. 2007). That is why protein identification requires the availability of sequenced genomes (at the organism level) or DNA and EST sequences for individual genes. A confident identification results from both, the experimental, MS and MS/MS, data and the quality of protein database derived from in silico translation of DNA/RNA sequences. For orphan, unsequenced, organisms or those poorly represented in the database it is necessary to construct a specific protein database from species-specific DNA or EST sequences deposited and dispersed in different databases (Romero Rodriguez et al. 2014). This custom-built protein database improves the rate and quality of identifications. Alternatively, the employment of a single Viridiplantae database (NCBI, UniProt and TAIR) will provide matches to orthologs, this being a confident identification for conserved proteins. The dilemma lies in identifying orthologs or gene products. From a practical point of view, for example in plant breeding, the former are useless.

Some algorithms and bioinformatics packages are available for the analysis, identification and quantification of proteins. Some of the most frequent algorithms employed are discussed in Nesvizhskii et al. (2007). They use three main strategies (Baldwin 2004):

  1. 1.

    Peptide mass fingerprinting, PMF. These results from the direct comparison of the mass parental peptide peak with the predicted, theoretical, one deduced in silico. This is only valid when matching against species-specific protein databases.

  2. 2.

    De novo sequencing, where peptide sequences are explicitly read out directly from fragment ion spectra.

  3. 3.

    Hybrid approaches, such as MS-Tag. Based on comparisons between the experimental mass of the parental ion fragments produced in the collision cell, and all the predicted fragments for all the hypothetical peptides of the appropriate molecular mass, based on known fragmentation rules.

In current publications, the results of the database search are presented in a table in which the identified proteins are presented as being grouped according to their function and with columns corresponding to the name (function), species and acronyms in the database, cellular location, theoretical and experimental Mr and pI, together with the parameters of confidence, including score, number of peptides and percentage of sequence covered. How confident an identification is should be probabilistically understood, and is a frequent subject of discussion. A ranking of high and low probabilities should at least be established for all the matches or hits found, with an attempt to be very conservative when interpreting the data from a biological point of view.

4 Conclusions

2-DE-based proteomics is a powerful technique that has generated a huge amount of data and information on different aspects of plant biology, from growth and development to responses to biotic and abiotic environmental cues. All the information is disseminated throughout the current literature, databases and repositories. However, the full potential of the technique is far from being fully exploited and future research should move in this direction, especially in PTMs and interactomics areas. As things stand at this moment, plant proteomics remains mostly descriptive and speculative. In this regard it is important to validate proteomics data from a functional point of view. It also means integration with classic approaches of plant physiology and biochemistry, and the modern –omics, including transcriptomics and metabolomics, in the biology system direction. The interpretation of proteomics data from a biological point of view is not always possible and we may convert our publications into simple speculations. The proteome covered is, in most cases, just one frame of a very complex film, which is the life cycle of any organism. A frame in which a minimal fraction of the total proteome appears or is visualized, but that is big enough to make its analysis in a classic format impossible. These, and other issues related to standards in plant proteomics publications, are discussed in Jorrín Novo (2015).