Keywords

Introduction

The establishment of the double helical structure of DNA by Watson and Crick in 1953 followed by the development of DNA sequencing techniques by Sanger et al. in 1977 was a landmark discovery that revolutionised life science research. The genome sequences of several species have been unravelled, from the simplest mycoplasma (Wasinger et al. 1995) to more complex eukaryotes including various plant species. The advent of technology from dideoxy sequencing methods to automated next-generation sequencing platforms has led to the exploration of more and more genomes. However, the exact function of most of the genes remains obscure, and the characterisation of proteins that are encoded by the genome is the present-day challenge. In the post-genomic era, proteomics is one of the fastest-growing areas of biological research, and its objective has moved beyond simple identification and cataloguing of proteins to more comprehensive study of functional and regulatory aspects. Nevertheless, the goal of proteomics remains to obtain a more comprehensive and integrated view of biology by studying all the proteins of a cell rather than an individual protein using the more broad aspects of dynamic protein machinery thereby including many different areas of study under the umbrella of proteomics. The aim of proteomics is not only to catalogue all the proteins in a cell but also to create a complete three-dimensional (3-D) map of the cell indicating its localisation which certainly will require the involvement of a large number of different disciplines such as molecular biology, biochemistry and bioinformatics (Chen and Harmon 2006; Graves and Haystead 2002).

Origins of Proteomics

Proteomics is the study of the proteome, i.e. the whole protein complement of the genome. Proteomics as a technology can be defined as the identification and characterisation of all proteins involved in a particular pathway, organelle, cell, tissue, organ or organism that can be studied in concert to provide an accurate and comprehensive data about that system both qualitatively and quantitatively (Kav et al. 2007; Wright et al. 2012). The terms “proteomics” and “proteome” were coined by Wilkins and colleagues in 1995 as the large-scale characterisation of the entire protein complement of a cell line, tissue or organism that reflects the terms “genomics” and “genome”, which describe the entire collection of genes in an organism. The era of proteomics starts with the introduction of the two-dimensional gel by O’Farrell (1975), Klose (1975) and Scheele (1975), who began mapping proteins from Escherichia coli, mouse and guinea pig, respectively. However, the development of 2-DE was a major step towards the development of proteomics, but the lack of proper sensitive protein sequencing and identification technologies hampered its progress. Nonetheless, the need of improved sensitive technique was critical for success because protein-loading capacity is one of the major limiting factors for both one-dimensional (1-D) and two-dimensional (2-D) gels. Apart from this, biological samples are also often limiting. The first major breakthrough technology to emerge for the identification of proteins was the inclusion of sequencing of proteins by Edman degradation followed by the development of mass spectrometry (MS) technology. Furthermore, the developments and improvements in microsequencing technology resulted in increased sensitivity of Edman sequencing, and the sensitivity of analysis and accuracy of results for protein identification by MS have increased by several orders of magnitude that even proteins in the femtomolar range can be identified in gels. However, MS is more sensitive, can tolerate protein mixtures and is amenable to high-throughput operations; it has essentially replaced Edman sequencing as a method of choice for the protein identification (Aebersold et al. 1987; Andersen and Mann 2000; Pandey and Mann 2000). The improved techniques in 2-DE allowed comprehensive protein visualisation on 2-D gels. The further development of biological MS and the growth of searchable sequence databases led to the advancement of proteomics. The advancements of the MS techniques that were developed for the ionisation of proteins and peptides include matrix-associated laser desorption ionisation (MALDI) and electrospray ionisation (ESI) combined with the time of flight (TOF), ion trap and triple-quadrupole tandem MS (MS/MS) spectrometers; these offer high sensitivity and mass precision (Karas and Hillenkamp 1988; Fenn et al. 1989; Aebersold and Mann 2003).

Although the separation and visualisation of proteins from crude extracts of tissue samples of an organism or cell culture by two-dimensional gel electrophoresis (2-DGE) followed by identification and characterisation by mass spectrometry (MS) became a common method of choice in proteomic analysis, a well-separated protein mixture within a particular pH range is not only important for obtaining a characteristic MS spectrum for protein identification but also for the quantitative analysis of differentially expressed proteins. However, isolation of proteins and obtaining a reproducible, well-resolved 2-D gel from plant tissue can prove to be particularly challenging due to the high phenolic content and high carbohydrate/protein ratio in most plant tissues. Nevertheless, the resolution of protein spots on a 2-D gel is limited by factors such as abundance, size and other electrophoretic properties; the complete proteome has been fractionated into sub-proteomes such as subcellular compartments, organelles and multiprotein complexes to improve sensitivity and resolution and to reduce the overall intricacy (Sarma et al. 2008; Jung et al. 2000; Park 2004).

Why Proteomics?

The genomic and transcriptomic informations are still fragmentary and can be insufficient to completely understand a complex organism as the DNA–RNA relationship may not be fully correlated to each other as well as not with proteins. Proteins are the functional molecules and, therefore, the most promising candidate to reflect differences in gene expression. Genes may be present and may be mutated, but they are not necessarily transcribed. Some messengers are transcribed but not translated, and thus the number of mRNA copies does not necessarily reflect the number of functional protein molecules, and the ability to qualitatively and quantitatively scrutinise mRNA and protein populations raises the tantalising prospect of deciphering the functional and regulatory networks that represent the bridge between genotype and phenotype. Contrary to the genome and transcriptome, the proteome of a given cell or organism is dynamic. The proteome of a cell reflects the immediate environment in which it is studied. In response to internal or external cues, proteins can be modified by PTMs, undergo translocations within the cell or be synthesised or degraded. Thus, examination of the proteome of a cell is like taking a “snapshot” of the protein environment at any given time. Considering all the possibilities, it is likely that any given genome can potentially give rise to an infinite number of proteomes. Unravelling the proteomes is significantly more challenging and complicated than the genomes for three main reasons. First, in higher eukaryotes, a single gene often produces many different forms of the protein, primarily due to alternative splicing and various posttranslational modifications (PTMs), resulting to its functional diversity. Secondly, genomes are largely stagnant throughout the lifetime of a cell or organism, whereas proteomes are highly versatile. Third, proteomics is currently more challenging than genomics because the technologies required for proteomics are more complex (Hegde et al. 2003; Wright et al. 2012; Graves and Haystead 2002; Rose et al. 2004; Celis et al. 2000). Therefore, the utmost need of proteomics is but natural and can be broadly explained under the following subheads:

Genome Annotation

The primary goal of all the genome sequencing efforts is to ascertain the molecular and the cellular functions of all the gene products. Whilst genome sequencing efforts reveal the basic building blocks of life, a genome sequence alone is insufficient for elucidating biological function. Genome annotation is the means for the identification of genes and assigning its biological function(s) from a particular genome sequence. Current high-throughput genome annotation uses a combination of comparative (sequence-based homology data) and noncomparative (ab initio gene prediction algorithms) methods to identify protein-coding genes in genome sequences. Sequence-based homology to already characterised proteins from other genomes is the basis of annotation of genes of which 30–50 % of predicted gene products either have no known homologs or show too little sequence homology to known proteins making the task of genome annotation difficult. Since approaches used to corroborate the presence of predicted protein-coding genes are typically based on expressed RNA sequences, they cannot independently and unambiguously determine whether a predicted protein-coding gene is translated into a protein. Indeed, the dependency on a sequenced genome or cDNA library may often be restrictive in the scope of studies, particularly for non-model organisms. However, one of the first applications of proteomics is to categorise the total number of genes out of a genome, and this “functional annotation” of a genome is necessary because it is still difficult to predict genes accurately from genomic data. Exact prediction of exon–intron boundaries and structure and alternative splicing of most of the genes, pseudogenes, promoter regulatory regions, untranslated regions and repeats is a difficult problem that cannot be precisely predicted by bioinformatics. Tools for the annotation of genomic and proteomic sequences and their structures have been developed during the last two decades and eventually made accessible to be used with an added advantage to a huge availability of characterised data. The databases having these data often focus in a particular area of annotation and are often most powerful when arranged in such a way in which the data can be probed computationally. For instance, CATH is a database of protein structural domains where an extensive view of a chosen protein family or a narrower view of a particular protein structure can be obtained. Proteomics would not be possible without genomics; however, this does not mean that it is incapable to assist genomics. On the contrary, proteomics provides a fast, relatively despicable and confident method for assembling a large amount of experimental evidence to assist genome annotation. It also has the additional advantage of confirming that transcripts are translated to the proteome stage and can help identify functional details of the mature protein. Over the last few years, there has been a move towards the integration of the wide range of genome and proteome annotation methods and databases in order to provide an overall view of the function of these genes.

Nevertheless, proteogenomics, i.e. an integrated approach of genomic information with that of data obtained from protein studies, is one of the solutions towards this problem to confirm the existence of a particular gene. Proteogenomics allows validation of predicted genes and, more importantly, correction of genome annotation errors such as detection of unannotated genes, reversal of reading frames, identification of translational start sites, stop codon read-throughs or programmed frame shifts and detection of signal peptide processing and other maturation events at the protein level. Several studies dedicated to genome reannotation based on experimental proteomics have paved the way for the proteogenomics approach. A high-throughput tandem mass spectrometry-based proteomics approach can be used to verify coding regions of a genomic sequence due to its ability to directly measure peptides arising from the expressed proteins. Therefore, proteogenomics approaches have the ability to improve the quality of genome annotations (Eisenberg et al. 2000; Yakunin et al. 2004; Liska and Shevchenko 2003; Ansong et al. 2008; Orengo et al. 1997; Wright et al. 2009; Reeves et al. 2009; Baudet et al. 2009/2010).

Protein Expression Studies

Genomic information provides an exceptional platform for cross-correlation between transcriptomic and proteomic characteristics of a particular gene, its expression and biological function(s). However, it is implausible that a simple unidirectional or linear relationship between the transcriptome and the proteome exists, as these two data sets are distinctly different and both have idiosyncratic control and regulation over biological effects. Transcriptome, a dynamic link between the genome, proteome and the cellular phenotype associated with physical characteristics is the subset of genes expressed in a specific cell or tissue type. Recently, a number of techniques have emerged that provide an extremely robust and potent set of tools to study comprehensive and quantitative genome expression. These include differential display PCR, cDNA microarrays and serial analysis of gene expression (SAGE). However, the analysis of mRNA is not a direct reflection of the protein content in the cell consequently having a poor correlation between the mRNA and protein expression levels. Transcription is merely the first step in a long sequence of events resulting in the synthesis of a protein since posttranscriptional control in the form of alternative splicing, polyadenylation and mRNA editing is an important step further. This is a significant step where many different protein isoforms can be generated from a single gene, whereas translational and posttranslational regulation is also an important step further. Proteins, having been formed by translation, are subject to PTMs. It is estimated that up to 300 different types of PTMs exist. Proteins can also be regulated by proteolysis and compartmentalisation. The analysis of protein expression profiles provides an additional information to genomic and mRNA analysis, since a proteome is dynamic and is spatially and temporally expressed. In addition to it, proteins are often functional as interacting molecules that carry out various cellular functions, such as signal transduction and dynamic (e.g. phosphorylation) and/or static modifications (e.g. disulphide linkage) that may not be perceptible from genomic information or from mRNA abundance (Corthals et al. 2000; Revel and Groner 1978; Kwon et al. 2006).

Protein Function

In due course of evolution, a large number of protein families have been produced which share the same three-dimensional architecture and often have detectable sequence and functional similarity. This conservation allows deducing the structural design of all proteins in a family even when only the structure of a single member is known and that eventually allows predicting the biological function(s). Despite the advancements in techniques for determining protein structure, the structures of many proteins are still unknown. With the help of protein prediction programs, computational analysis of genome sequences is producing numerous new hypothetical proteins of unknown structure and function. These proteins are called “hypothetical proteins” as they represent the products predicted from the gene sequence; however, no circumstantial evidence for their existence and function is available so far. Several studies revealed that no function can be assigned to about one-third of the sequences in organisms for which the genomes have been sequenced. The complete identification of all proteins in a genome will help the field of structural genomics in which the ultimate goal is to obtain 3-D structures for all proteins in a proteome. This is indispensable since the functions of many proteins can only be inferred by examination of their 3-D structure. Structural genomics or structural proteomics can be defined as the quest to obtain the three-dimensional structures of all proteins. Comparatively, as a recent scientific discipline, proteomics uses a variety of old and new techniques to reveal the structure and conformation, as well as measuring protein concentrations in varying conditions. Structural data can be used to determine the function of various proteins, based on comparison to similar proteins with known functions. The major challenges ahead in structural proteomics include the identification of all the proteins on the genome-wide scale, determining their structure–function relationships and outlining the precise 3-D structures of the proteins. Hitherto, X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy is the technique typically used to determine the protein structures. Nonetheless, a detailed knowledge of 3-D structure by these techniques is still fragmentary. Therefore, computational methods such as comparative and de novo approaches and molecular dynamic simulations are intensively used as an unconventional tool to predict the 3-D structures and dynamic behaviour of proteins. Computational programs may help to predict the structure of proteins having unknown function. These programs can prove to be a predictive model of the unknown protein’s structure by comparing the sequence of the unknown protein to proteins with known 3-D structures and function(s) as templates. This structure-based prediction of functional information is preferred over the sequence-based extrapolations since the similarity in structure is generally more decipherable than the similarity in sequence, and moreover, structure frequently allows a more sensible and informative transfer of functional description than sequence alone. By using the methods that rely only on the structure of the protein to be characterised, such as the matching of 3-D patterns and precise docking of ligands, structural genomics will contribute to functional annotation of proteins in addition to improving homology-based arguments. However, the fecundity of this method depends on the quality of the match between the known template proteins and the unknown target protein. Nevertheless, these prediction programs do not produce structures with the detail or reliability of experimental techniques such as X-ray crystallography or NMR, whereas these prediction methodologies provide a means to critically analyse in a reasonably limited time period resulting into the identification and characterisation of a large number of new proteins identified by the analysis of whole genomes. Therefore, the ultimate aim of structural proteomics is not to obtain the structures or models for all the unknown proteins but to elucidate its functional annotation (Sánchez et al. 2000; Edwards et al. 2000; Liu and Hsu 2005; Christendat et al. 2000; Eisenstein et al. 2000).

Protein Modifications

The cellular integrity and morphology to accomplish numerous biological functions rely on an intricate interplay between the thousands of different biomolecules, whereas the basic biological functions of proteins are encoded by the associated genes. However, the real-time dynamics and regulation of protein structure and function are by and large carried out by specific PTMs of proteins such as phosphorylation, glycosylation and acylation. During recent years, protein PTMs have fascinated the biological and biomedical research especially the plant proteomics to unravel the mechanisms underlying various stress adaptations. Posttranslational modifications (PTMs) are involved in the regulation of a wide range of biological processes such as protein structure, activity and stability. Several hundred PTMs have been known thus so far; nonetheless, relatively very few have been studied using mass spectrometry and proteomics. Initially, different methods for PTM characterisation are developed to study yeast and mammalian biology and later on adopted to explore plants. As a part of a quantitative proteomics strategy, it is helpful to enrich for PTMs on the peptide level to not only identify the PTM but also to establish the functional relevance in the context of regulation, response to different types of biotic and abiotic stresses etc. Using mass spectrometry-based methods, protein phosphorylation is the only PTM that has been studied extensively thus so far at the proteome-wide level in plants. PTMs have been extensively exposed to influence protein–protein interactions, subcellular localisation and an array of both internally and externally generated signal transduction into cellular outcomes often leading to phenotypic variations. A detailed analysis of these modifications presents a formidable challenge; however, their determination generates an indispensable insight into the biological function(s). Methods and techniques developed to characterise individual proteins are now systematically applied to protein populations. A combinatorial approach of function- or structure-based purification of modified “sub-proteomes”, such as phosphorylated proteins or modified membrane proteins, with mass spectrometry is particularly successful. Mass spectrometry has become a method of choice for the elucidation of several types of PTMs in both qualitative and quantitative manners. Due to the availability of large data sets on the proteome-wide level, the identification of combinatorial PTM patterns has become feasible. Various reports in this area reveal that many proteins undergo multiple modifications and the sequential or hierarchical patterns exist on many proteins; the biology of these modification patterns is only starting to be unravelled (Jensen 2004, 2006; Ytterberg and Jensen 2010; Zhao and Jensen 2009; Mann and Jensen 2003; Young et al. 2010).

Protein Localisation

Signal hypothesis revealed the existence of zip codes for directing proteins or protein complexes to subcellular compartments such as the nucleus, cytoplasm, mitochondria, endoplasmic reticulum, lysosomes and endosomes, peroxisomes, Golgi and nucleolus so that they interact at defined sites at the correct time. Proteins need to be localised to their proper cellular compartments in order to perform their biological functions. For instance, to promote gene expression, most transcription factors used to localise in the nucleus, whereas some proteins, such as the glucocorticoid receptor, may localise in one compartment (cytoplasm) temporarily and further localise to another compartment (nucleus) in response to a stimulus. Assigning a subcellular location to a protein is very desirable and inevitable to biologists since not only it can help to reveal their role in the cell processes but also it can redefine the knowledge of cellular processes by pinpointing certain activities to specific organelles as proteins are spatially organised according to their function. Protein localisation is one of the most important regulatory mechanisms known as the mislocalisation of proteins is well known to have profound effects on cellular function (e.g. cystic fibrosis). Membrane-bound organelles and discrete cytoskeletal components are the key features of eukaryotic cells that serve to sequester the components into well-defined spaces. Identification of organelle proteins and their macromolecular structure is therefore a key step towards a comprehensive understanding of its biology.

Although recent genomic approaches promise a plethora of information, several fundamental proteomic data sets remain uncatalogued. Protein localisation is assumed to be a strong indicator of gene function and is also useful as a method of evaluating protein information inferred from genetic data, for instance, supporting or refuting putative protein interactions suggested by two-hybrid assay. Furthermore, the subcellular localisation of a protein can often reveal its mechanism of action. Proteomics aims to identify the subcellular localisation of each protein that information can be used to generate a 3-D protein map of the cell, providing novel information about protein regulation. The enrichment of the subcellular compartments followed by the identification of their protein contents by proteomics is a powerful method for rapid protein localisation. To date, very few studies have characterised protein localisation on a large scale, primarily because traditional methods to assign proteins to subcellular locations are mostly targeted to a single protein of interest, and very few high-throughput methods exist by which reporter fusions or epitope-tagged proteins can be generated and subsequently localised. To address this problem, a large data set can be created by integrating the localisation data available thus so far (Davis et al. 2007; Lilley and Dupree 2007; Drumm and Collins 1993; Simpson and Pepperkok 2003; Dunkley et al. 2004; Kumar et al. 2002).

Protein–Protein Interactions

In general, functional proteins interact with each other and very rarely act as single isolated entity. To elucidate the function(s) of an unknown protein, a possible approach is to investigate the function(s) of proteins interacting with it. The systematic study of protein–protein interactions for the purpose of elucidating protein functions is termed “interaction proteomics”. One of the elementary significance of proteomics is the understanding of protein–protein interactions since the processes of cell growth, cell cycle, programmed cell death etc. are all regulated by signal transduction through protein complexes. Therefore, revealing the mechanism underlying these cellular processes is important. There are quite a few approaches to analyse protein–protein interactions, i.e. based on (1) the biochemical analysis of multiprotein complexes, for example, pull-down and affinity capture methods; (2) molecular biology approaches that basically include the yeast two-hybrid assay, fluorescence resonance energy transfer and bimolecular fluorescence complementation (BiFC); and furthermore (3) the in silico prediction methods. Proteomics aims to develop a complete 3-D map of all protein interactions in the cell to identify the members of functional protein complexes, pathways and protein–ligand binding. Recently, proteome-scale protein physical interaction maps for several organisms have been determined. Further, these physical interactions are complemented by a wealth of information that includes other types of functional relationships between proteins, including genetic interactions, co-expression patterns and mutual evolutionary history. Whole-proteome protein interaction maps can be constructed by taking collectively these pairwise linkages. As protein–protein interactions are fundamental to most biological processes, the systematic and logical identification of all protein–protein interactions is considered a key strategy for revealing the cellular processes. Consequently, several experimental and computational techniques have been developed to methodically determine both the potential and actual protein interactions in selected model organisms. As these interactions are likely to be correlated with the protein’s functional properties, protein interaction maps are frequently utilised to reveal in a systematic fashion the potential biological role of proteins of unknown functional classification. Strategies to explore protein–protein interactions, affinity purification and mass spectrometry, yeast two-hybrid, imaging approaches and various diverse databases have been developed. As a result of an increase in the number of identified proteins with the development of MS and large-scale proteome analyses consequently, the false-positive protein identification rate has also increased. Therefore, the universal consensus is to confirm protein–protein interaction data using one or more independent approaches for an accurate evaluation. Furthermore, identification and characterisation of minor protein–protein interactions are fundamental for understanding the functions of transient interactions and low-abundance proteins. The development of new methods and/or improvements in existing methods in addition to the establishment of protein–protein interaction methodologies is highly desirable. These involve detection of minor proteins by MS, multidimensional protein identification technology or OFFGEL electrophoresis analyses, one-shot analysis with a long column or filter-aided sample preparation methods. These sophisticated techniques should permit thousands of proteins to be identified, whereas in-depth proteomic methods should permit the identification and characterisation of transient binding or protein–protein interactions with weaker affinity (Nabieva et al. 2005; Pawson and Nash 2000; Yook et al. 2004; Yanagida 2002; Fukao 2012).

Structural Proteomics

Recently, a large-scale protein structural determination initiated the era of “structural genomics” or “structural proteomics”. As protein 3-D structures are more conserved than the sequence, these initiatives also pave the way of biochemical or biophysical functional characterisation through structure. Proteomics studies whose objective is to study a detailed account of the structure of protein complexes or the proteins present in a specific cellular compartment are known as structural proteomics. Structural proteomics emerged as the outcome of synchronised development of high-throughput methodologies and technologies that enabled novel data to be generated with greater efficacy. Structural proteomics attempts to identify all the proteins within a protein complex or organelle, establish their location and characterise all protein–protein interactions. Analysis of the experimental or modelled 3-D structures is one of the key components for the functional understanding of unknown proteins. However, structural proteomics technologies are generating protein structures at an exceptional rate; nevertheless, the current knowledge of 3-D structural detail is still limited. It is usually accepted that quite a few of the structural proteome has a template structure from which reliable conclusions can be drawn; however, 3-D structural coverage of proteins may vary. Nevertheless, these informations will help piece together the overall architecture of cells and explain how expression of certain proteins gives a cell its unique characteristics leading to a unique phenotype. Thus, structural proteomics has a major aim to assemble a map of protein structures that will represent all the proteins included in the “global proteome” (Wild and Saqi 2004; Norin and Sundström 2002; Blackstock and Weir 1999).

Proteome Analysis: Current Technology and Challenges

The era of genomics is well established with an ever-growing number of genomes being sequenced and added to the database everyday. The first plant genome to be sequenced was that of Arabidopsis thaliana in 2000, followed by the first crop plant Oryza sativa. Thereafter, multiple plant genomes were added to the database, tomato being the most recent (2011). However, the greater job of genome annotation is still in its infancy, and proteomics remains to be the most powerful tool. Unlike animal or microbial genomes, plant genomes have additional complexities in terms of ploidy levels and genome duplication. Besides, the extraction of protein and purification presents a greater challenge. The era of proteomics started with gel-based approaches for the resolution of protein mixtures, which are still considered as the touchstone of proteomics though they suffer from various limitations (O’Farrell et al. 1977). The initial proteome analysis pipelines were based on a 2-DE PAGE separation followed by Edman end sequencing methods. However, the advances in MS/MS technology have led to more powerful tools for the identification of mixture of proteins using gel-free methods. Most gel-free approaches use a bottom-up strategy where proteins are first digested with a proteolytic enzyme, and the obtained complex peptide mixture is then separated via reversed-phase (RP) chromatography coupled to a tandem mass spectrometer. This strategy is currently only successful with partial or simple mixtures. Another major advent of this decade has been the quantitative differential proteomic expression analysis using MS/MS-based ICAT and ITRAQ techniques which has truly revolutionised the field of proteomics. It has led to the automation of the analysis pipeline and created a parallel to the high-throughput platform of genomic studies ( Gygi et al. 1999; Gygi et al. 2002; Zhou et al. 2002). However, the challenges with analysis of posttranslational modifications and dynamic resolution of proteins remain to be optimally resolved. In large proteomes, the dynamic resolution is generally limited, and only the most abundant proteins are detected. This has been improved by fractionating a proteome into smaller sub-proteomes. In addition, complex proteomes can be analysed more in depth by a combination of separation techniques combining gel-based and non-gel-based methods.

Mass Spectrometry

Mass spectrometry (MS) is a technique for “weighing” molecules. However, the mass measurement is not done using balance or scale but is based upon the motion of a charged particle, called an ion, in an electric or magnetic field.

A mass spectrometer is an instrument that produces ions and separates them in a gas phase according to their mass-to-charge ratio under an electric or magnetic field.

The origin of mass spectrometry lies in the classical experiment by J.J. Thomson more than 100 years ago in the University of Cambridge, England. J.J. Thomson discovered that electric discharges in gases produced ions, and these rays of ions would adopt different parabolic trajectories according to their mass when passed through electromagnetic fields. This separation of ions according to their mass (and charge) formed the foundations of modern mass spectrometry experiments.

figure a

Francis William Aston, who was a student of J.J. Thomson, designed several mass spectrographs in which ions were dispersed by mass and focused by velocity. This led to improvements in mass resolving power and the subsequent discovery of isotopes for many common naturally occurring elements. Thomson and Aston were honoured for their achievements and received Nobel Prizes in Physics and Chemistry in 1906 and 1922.Aston FW (1907) Experiments on a new cathode dark space in helium and hydrogen. Proc R Soc Lond Ser A 80(535):45–49Aston FW (1907) Experiments on the length of the cathode dark space with varying current densities and pressures in different gases. Proc R Soc Lond Ser 79(528):80–95

The power of MS/MS lies in the accuracy of the instrument. Large biomolecules such as peptides can be measured within an accuracy of 0.01 % of the molecular mass. This allows for the identifications of single amino acid differences or a posttranslational modification. Small organic molecules can be measured to an accuracy of 5 ppm which is essential for isotopic detections and deduction of molecular formulae. Structural information can be generated using special types of mass spectrometers with multiple analysers which are known as “tandem mass spectrometers”. The sample is fragmented inside the instrument and sequentially analysed. The data generated is useful for the structure elucidation of organic compounds and for peptide or oligonucleotide sequencing.

There is wide range of applications possible for mass spectrometry in academia, research or industry which may be broadly summarised:

  • Accurate molecular mass measurements: to confirm sample, to determine the purity of a sample, to verify amino acid substitutions and to detect posttranslational modification

  • Reaction monitoring: to monitor enzyme reactions, chemical modification and protein digestion

  • Amino acid sequencing: sequence confirmation, de novo characterisation of peptides and identification of proteins by database searching with a sequence “tag” from a proteolytic fragment

  • Oligonucleotide sequencing: sequencing and quality control of oligonucleotides

  • Structure determination: organic compound structure, protein folding monitored by H/D exchange, protein–ligand complex formation under physiological conditions and macromolecular structure determination

Mass spectroscopy is now essentially used in any analytical laboratory for applications ranging from analysis of biopolymers, drug discovery, drug testing, environmental monitoring, geological data mining etc.

The Pipeline of Mass Spectroscopy

A mass spectrometer consists of three modules: (1) an ion source that converts the gas phase sample molecules into ions (or, in the case of electrospray ionisation, moves ions that exist in solution into the gas phase); (2) a mass analyser, which sorts the ions by their masses by applying electromagnetic fields; and (3) a detector that measures the value of a signal quality and quantity and generates temporal data for calculating the abundances of each ion present. The inlet introduces the sample into the vacuum of the mass spectrometer. In the source region, neutral sample molecules are ionised and then accelerated into the mass analyser. The mass analyser is the core of any mass spectrometer. Ions are separated in this section, either in space or in time, according to their mass-to-charge ratio. After the ions are separated, they are detected, and the signal is transferred to a data system for analysis. All mass spectrometers also have a vacuum system to maintain the low pressure or high vacuum required for the operation. High vacuum minimises ion–molecule reactions and scattering and neutralisation of the ions (Fig. 1).

Fig. 1
figure 1

The pipeline of mass spectrometry

A mass spectrometer only recognises “MASS”. It produces a spectrum where the x-axis is m/z and y-axis is the intensity or cps (counts per sec). Each molecule produces characteristic spectra which are considered as the “mass fingerprint” for that molecule. The identification of the unknown is done by comparing with a library of standard mass spectrograms for a wide range of molecules. The sample can be in any form of matter, i.e. liquid, solid or gas, but can only be introduced in a vapour form, which has to be ionised essentially. The instrument reads the charge imparted to the molecule and not the inherent charge of the molecule. Hence, the inlet and the source act as feeders for mass analysers which are actually responsible for mass determination.

Ionisation Methods

The identification in a mass spectrometer necessarily requires the sample to be introduced as gas phase ions. The process of ionisation for nonvolatile, polar or charged molecules can occur by a loss/gain of an electron or of a charged particle (e.g. proton), generating odd or even electron ions, respectively.

Various ionisation methods can be used, depending on the type of sample under investigation and the mass spectrometer available (“Ionization Methods in Organic Mass Spectrometry”, Alison E. Ashcroft, The Royal Society of Chemistry, UK, 1997).

Ionisation Methods Include:

  • Electron ionisation (EI)

  • Chemical ionisation (CI)

  • Electrospray ionisation (ESI)

  • Fast atom bombardment (FAB)

  • Field desorption/field ionisation (FD/FI)

  • Matrix-assisted laser desorption ionisation (MALDI)

  • Thermospray ionisation (TSP)

Most ionisation techniques excite the neutral analyte molecule which then ejects an electron to form a radical cation (M+_)*. Other ionisation techniques involve ion–molecule reactions that produce adduct ions (MH+). The most important considerations are the physical state of the analyte and the ionisation energy. Electron ionisation and chemical ionisation are only suitable for gas phase ionisation and are generally considered as hard techniques used for organic compounds or small molecules. However, soft ionization techniques such electrospray and matrix-assisted laser desorption are used to ionize condensed phase samples and biomolecular investigations. The ionisation energy is important because it controls the amount of fragmentation observed in the mass spectrum. Though fragmentation complicates the mass spectrum, it provides structural information for the identification of unknown compounds. Soft ionisation techniques only produce molecular ions whilst other techniques are more energetic and cause ions to undergo extensive fragmentation. Typically, peptide identification and thermally labile sample applications make use of ESI or MALDI platforms.

ESI: Electrospray Ionisation

ESI (Fenn et al. 1989) is a soft ionisation technique that results in both single and multiply charged ions. The sample is introduced through an ultrafine needle into a strong electric field (typically ±3–5 kV), creating a spray of charged droplets. The charged droplets are desolvated by a counter current drying gas or heat causing the droplet to evaporate. As the solvent evaporates, the droplet shrinks and the charge density at the surface of the droplet increases. The droplet finally reaches a point where the coulombic repulsion from the electric charges exceeds the Rayleigh’s stability limit. Electrostatic repulsion greater than the surface tension holding the droplet together causes an explosion in the droplet, creating multiply charged analyte ions. Because electrospray produces multiply charged ions, high-molecular-weight compounds are observed at lower m/z value. This increases the mass range of the analyser so that higher-molecular-weight compounds may be analysed with a low-resolution mass spectrometer (Fig. 2).

Fig. 2
figure 2

Schematic representation of ESI process. The electrospray is created by applying a large potential between the metal inlet needle and the electrode plate (3–5 kV) located at a distance of ~0.5–1.0 cm from the needle. The liquid droplets leave the nozzle and the electric field induces a net charge on the small droplets. As the solvent evaporates, the droplet shrinks and the electrostatic repulsions increase. The droplet finally reaches a point where the coulombic repulsion from this electric charge is greater than the surface tension holding it together. This causes the droplet to explode and produce multiply charged analyte ions

ESI typically generates singly or doubly charged ions for peptides <2,000 Da whilst higher-molecular-weight peptides yield a series of multiply charged ions. These multiple ion series give an advantage of independent verification of precursor ion mass calculation and deriving various states in the complete reaction. Most commercial mass analysers can now be customised with an ESI source as per the requirement of the type of sample.

MALDI: Matrix-Assisted Laser Desorption/Ionisation

MALDI (Karas and Hillenkamp 1988; Beavis and Chait 1996) is classically used to analyse thermolabile large molecules such as peptides. The sample is mixed with excess of specific wavelength material known as matrix (e.g. ά-cyano-4-hydroxycinnamic acid typically used for peptides. The sample is ionised by exposure to short duration pulses of UV light from nitrogen laser. This leads to ionisation, which is generally protonation, of both the matrix and the analyte. The matrix absorbs the primary energy and transfers it to the sample, and indirect ionisation of the analyte occurs. Application of a high-potential electric field desorbs the ions into the mass analyser. The sample gets converted to gas phase ions directly from a solid state which is suitable for thermolabile molecules such as peptides (Fig. 3).

Fig. 3
figure 3

Schematic representation of MALDI. The sample is prepared by mixing the analyte with a matrix compound which can absorb the UV laser (λ 337 nm). This is placed on a probe tip or a sample plate generally made of inert metal like gold and dried. A laser beam is then focused on this dried mixture, and the energy from a laser pulse is absorbed by the matrix. This energy is transferred to the analyte which both ionises and converts the ions in the gaseous phase. These ions are then desorbed into the mass analyser

MALDI is often a preferred method of choice for the analysis of synthetic and natural polymers, proteins and peptides. Analysis of compounds with molecular weights up to 200,000 Da is possible.

Mass Analysers

The ions are created in the source region by any of the above-described methods and are accelerated into the mass analyser by an electric field. The function of the mass analyser is to separate these ions according to their m/z value. Each analyser has very different operating characteristics, and the selection of an instrument depends upon the mass range, type of the analyte, time for analysis, resolution etc.

Mass analysers are typically described as:

  1. 1.

    Continuous analysers

    • Quadrupole filters

    • Magnetic sectors

  2. 2.

    Pulsed analysers

    • Time of flight (TOF)

    • Quadrupole ion trap mass spectrometers (QUISTOR)

    • Fourier transform-ion cyclotron resonance (FT-ICR)

Continuous analysers allow a single selected m/z to the detector, and the mass spectrum is obtained by scanning the mass range so that different mass-to-charge ratio ions are detected. It can be compared to a filter or a monochromator used for optical spectroscopy. At a given mass window set by the instrument, a certain m/z is selected whilst other ions are lost. These instruments are useful for single-ion monitoring whilst a complex mixture may not result in good resolution.

Pulsed mass analysers on the other hand scan through the entire mass spectrum from a single pulse of ions. These instruments have a distinct advantage with complex mixtures such as peptides where multiple m/z ratios are typically observed. They have a higher signal-to-noise ratio as compared to the continuous analysers. Analysis of peptides generally is done by a TOF or QUISTOR analyser.

Mass Analysis

TOF: Time of Flight

The analysis of mass in a TOF instrument is determined by measuring the time taken by the charged ion to cover a known distance in vacuum under a fixed magnetic and electric field. The mass-to-charge ratio (m/z) can thus be determined as a function of time (Fig. 4):

Fig. 4
figure 4

Time of flight mass analyser

$$ \begin{array}{l}\begin{array}{l}\begin{array}{ll}F=Q\left(E+v?B\right)\hfill & \kern0.92em \mathrm{Lorentz}\;\mathrm{force}\;\mathrm{law}\hfill \\ {}F=ma\hfill & \kern0.92em \mathrm{Newton}?\mathrm{s}\;\mathrm{second}\;\mathrm{law}\hfill \end{array}\hfill \\ {}\frac{m}{Q}=\frac{\left(E+v?B\right)}{a}\hfill \\ {}\frac{m}{Q}=\left\{\frac{\left(E+v?B\right)}{d}\right\}{t}^2\hfill \\ {}\frac{m}{Q}=K{t}^2\hfill \\ {}\sqrt{m/z}?t\hfill \end{array}\hfill \\ {}F=\mathrm{Force,}\;m=\mathrm{mass,}\;a=\mathrm{acceleration,}\hfill \\ {}E=\mathrm{electric}\;\mathrm{field,}\;v\;?\;B=\mathrm{vector}\;\mathrm{cross}\;\mathrm{product}\hfill \\ {}\mathrm{of}\;\mathrm{ion}\;\mathrm{velocity}\;\mathrm{and}\;\mathrm{magnetic}\;\mathrm{field,}\hfill \\ {}Q=\mathrm{ion}\;\mathrm{charge,}\;d=\mathrm{distance,}\hfill \\ {}K=\mathrm{instrument}\;\mathrm{constant}\hfill \end{array} $$

The ions are accelerated into the flight tube by an electric field (typically 2–25 kV) under vacuum. Since the force applied is the same, all the ions are accelerated with kinetic energy being directly proportional to mass. The velocity of the ion is thus an inverse function of mass, i.e. the greater the mass of the ion, the longer it takes to reach the detector. The striking of the ions at the detector is recorded with respect to time fractions, and the number of hits gives the abundance of a particular m/z signal. The plot between time of flight and the signal intensity is converted to a mass spectrograph for calculation of mass of striking fragment ions.

QUISTOR: Quadrupole Ion Trap

The quadrupole ion trap, which was developed in the late 1990s, is a three-dimensional quadrupole, which is capable of first trapping the ion and then analysing the mass of the entire stream of ions from a single source sequentially. The signal-to-noise ratio is high as all the ions are detected. The QUISTOR consists of a hyperbolic ring electrode and two end cap electrodes which form a hollow centre or the ion trap. The space between the two end caps allows the movement of the ions and in and out of the trap. A combination of RF and DC voltages is applied to the electrodes to create a quadrupole electric field. This electric field traps the ions in a potential energy well at the centre of the analyser. To obtain a mass spectrum, electric field is varied such that it sequentially brings ions with increasing m/z in resonance with the applied frequency. This eventually destabilises the ions and the alteration in the velocity and trajectory of the ions ejects them out of the trap. These sequentially ejected ions are then detected by the detector. The precursor ion and the product ions produced by collision-induced dissociations (CIDs) can be separated in time to produce the entire spectrum of ion products. This is particularly useful in determining structures and building up protein sequences de novo. Both ESI and MALDI sources can be used with an ion trap instrument.

FT-ICR: Fourier Transform-Ion Cyclotron Resonance

FT-ICR (Marshall and Verdun 1990; Amster 1996) instrument captures the ions in a three-dimensional space created by a magnetic field. The mass analyser consists of a reaction cell bound by electrodes known as trapping, excite and detect plates and a magnetic field generated by supermagnets. The m/z value of an ion is directly related to the cyclotron frequency. The ICR traps ions in a magnetic field that causes ions to travel in a circular path. The ion’s cyclotron frequency (ω) is the angular frequency of an ion’s orbit. This frequency is determined by the magnetic field strength (B) and the m/z value of the ion. After ions are trapped in this cell, they are detected by measuring the signal at this cyclotron frequency. This type of mass analyser has an extremely high mass resolution and is useful for tandem mass spectrometry experiments.

MS/MS: Tandem Mass Spectrometry

Tandem mass spectrometry involves at least two stages of mass analysis, with some form of fragmentation of the parent ion occurring in between the events. Multiple stages of mass analysis separation can be either achieved in space by physically distinct elements linked together in a series. The precursor ion from one unit feeds into another for further fragmentation to form the product ion. These elements could be sectors, transmission quadrupole or time-of-flight instruments. The separation of these linear events could also be achieved in time by trapping the precursor ions in three-dimensional space and inducing fragmentation to release the product ion at subsequent time intervals. QUISTOR and FT-ICR are typically used for tandem mass spectrometry experiments.

Protein Mass Spectrometry

One of primary applications of mass spectrometry in proteomics is protein identification or sequencing. Proteins of interest generally are present as a part of a complex proteome. These proteins first need to be fractionated and then subjected to mass analysis. There are various fractionation platforms which are employed before subjecting a protein for identification by MS/MS. The workflow of a typical proteomic analysis is described in Fig. 5. After the isolation of protein mixture, it is subjected to various resolution techniques which may be gel based or non-gel based. The protein is then directionally cleaved using trypsin, which cuts at the C-terminal end of arginine or lysine when not followed by proline to generate the peptide fragment. The two major modes of ionisation used are MALDI and ESI which are soft ionisation techniques and generate moderate number of fragments. Peptide fragment masses are determined by MS by either peptide mass fingerprinting or MS/MS platforms (Fig. 6).

Fig. 5
figure 5

Quadrupole ion trap. (a) Ion trap by the ring and the end cap electrodes. (b) Ejection of ions sequentially based on m/z to the detector

Fig. 6
figure 6

A typical workflow for a typical protein mass spectrometry experiment

Peptide Fragmentation

The types of fragment ions observed in an MS/MS spectrum depend on the primary sequence, the amount of internal energy, ionisation method, charged state etc. (Fig. 7):

  • Fragments can only be detected if they carry at least one charge.

  • If this charge is retained on the N-terminal fragment, the ion is classed as either a, b or c.

  • If the charge is retained on the C-terminal, the ion type is either x, y or z.

  • A subscript indicates the number of residues in the fragment.

Protein identification process is primarily divided into two parts: (1) assimilation of data generated by MS/MS in silico and (2) interpretation of the data by searching against a database. The mass spectrometer only gives masses of individual fragment ions as a set observed for a single-source ion. Series of b-ion and y-ion are required to build up the peptide fragment sequence. Multiple fragment sequences are overlapped before arriving at a consensus sequence. The set of fragments generated by MS/MS act as a fingerprint for an individual peptide. Two or more peptides matching to a given protein by searching protein sequence databases give the identity of the protein.

Fig. 7
figure 7

Roepstorff–Fohlmann–Biemann nomenclature used when the peptide backbone is fragmented by imparting energy onto the molecule

Protein Identification by Peptide Mass Fingerprinting

The protein is subjected to proteolysis by trypsin and peptide fragments are generated. The mass for each peptide fragment is determined by MS. Each protein results in a set of masses ascribed to fragment ion. The entire set of fragments ion masses generated by a single protein is considered to be unique and serves as a fingerprint to identify the protein. The database for peptide mass fingerprint consists of theoretical digest patterns for all protein sequence entries serving as the directory of mass fingerprint for each protein. The generated data can thus be matched with the library of mass fingerprints in these databases. The success of identification of PMF is however dependent on sequence availability of the required protein in the fingerprint database. Hence, the applications are limited to sequenced genomes.

Protein Identification by Peptide Fragment Ion Searches

This protein identification methodology is a hierarchical process where first a peptide sequence is built up by using the MS/MS data for peptide fragment ions and then all the peptides identified from a single protein are matched against the database for identifying the protein. The individual fragments identified are then assigned to a given protein by searching against protein sequence databases. The fidelity of a match is determined by the total score of the protein identification which is based on number of peptides matching to the same protein, length of the match, percentage coverage of the protein etc. There are various softwares available such as PepIdent and MASCOT being the popular ones (Fig. 8).

Fig. 8
figure 8

Example of an annotated MS spectrum. The information about the peptide sequence can be inferred from the mass differences between the peaks (Jonscher 2005)

Advances in Plant Proteomics

Subcellular Proteomics

Organelle proteomics involves isolating the organelle of interest and producing a catalogue of the proteins present in that organelle by some form of separation of proteins or their proteolytic fragments followed by identification utilising mass spectrometry. The organelle preparation must be free from contamination from other organelle types to determine the specific localisation of a protein with high confidence. Recently, several high-throughput methods have emerged involving quantitative strategies, which have overcome the need to produce a pure organelle for analysis. Each of these methods relies on quantitative proteomics to characterise the distribution pattern of organelles amongst partially enriched fractions generated by various separation technologies and has the potential to discriminate between genuine organelle residents and contaminants without preparation of pure organelles.

Secretome

The plant secretome refers to the set of proteins secreted out of the plant cell into the surrounding extracellular space commonly referred to as the apoplast. Secreted proteins maintain cell structure and act in signalling and are crucial for stress responses where they can interact with pathogen effectors and control the extracellular environment. Secretome studies have firmly established the presence of a substantial level of secreted proteins lacking signal peptides and indicated a large degree of plant species specificity in the composition of secreted proteins. Plant secretomes have been studied in natural conditions (Soares et al. 2007), in different cultivars (Konozy et al. 2013), during nutritional deficiency (Tran and Plaxton 2008), after hormone treatment (Cheng et al. 2009a, b), during temperature change (Gupta and Deswal 2012), during salt stress (Song et al. 2011) and in the presence of pathogens and elicitors (Kim et al. 2009a, b). Martinez-Esteso et al. (2009) studied the grape secretome of SSC in response to methylated cyclodextrins and methyl jasmonate (MeJA) and showed that the expression levels of peroxidases, pathogenesis-related (PR) proteins, SGNH plant lipase-like proteins, xyloglucan endotransglycosylase and subtilisin-like protease were affected. In a similar study, application of elicitors MeJA and cyclodextrins also led to the identification of chitinases and other PR proteins in tomato SSC (Briceno et al. 2012). Gupta et al. (2011) characterised the secretome from SSC of the legume chickpea and identified over 700 proteins by combining 1-D SDS-PAGE and HPLC-MS/MS, and comparing the secretome based on sequence homology to previously published Arabidopsis, Medicago and rice data showed a large degree of species specificity in secreted proteins hinting at differences in the apoplast composition between species and monocots and dicots. Several studies have targeted the rhizosphere. Over 100 secreted proteins were identified from rice roots grown in an aseptic hydro-culture (Shinano et al. 2011). These proteins are believed to play an important role in the rhizosphere, and a relatively high number (54 %) had predicted signal peptides.

Nuclear Proteome

Nuclear proteome has recently gained importance as nucleus carries the information necessary for controlled expression of proteins and thus plays an essential role in determining plant response towards any developmental process and biotic or abiotic stress (Ref). Nuclear proteins are predicted to comprise about 10–20 % of the total cellular proteins, suggesting the involvement of the nucleus in a number of diverse functions. Researchers have identified several hundred plant nuclear proteins predominantly from model plants Arabidopsis (Bae et al. 2003) and Medicago (Repetto et al. 2008, 2012) and crop plants like rice (Choudhary et al. 2009), hot pepper (Lee et al. 2006), soybean (Cooper et al. 2011), xerophyta (Abdalla et al. 2010; Abdalla and Rafudeen 2012) and chickpea (Pandey et al. 2006, 2008). These proteins were presumably associated with a variety of functions, viz. nucleoskeleton structure, development, DNA replication/repair, chromatin assembly/remodelling, signal transduction, mRNA processing, protein folding, transcription regulation, transport, metabolism and cell defence and rescue. The identified proteins revealed the presence of complex regulatory networks that function in this organelle. Many proteins with unknown functions have also been added to the database besides novel proteins which were expected to be present in the nucleus. This clearly displays the power of proteomics where unusual locations of a known protein may result in totally new function.

Mitochondrial Proteome

Plant mitochondrial genomes have features that distinguish them radically from their animal counterparts: a high rate of rearrangement, of uptake and of loss of DNA sequences and an extremely low point mutation rate. Mitochondria act as a cellular powerhouse and also perform numerous other activities like nucleotide and vitamin synthesis, lipid and amino acid metabolisms and involvement in the photorespiratory pathway (Millar et al. 2005). Under stress, the mitochondrial electron transport chain becomes over-reduced, favouring the generation of O2•, thus affecting plant growth and development (Purvis 1997). Kruft et al. (2001) first used the two-dimensional polyacrylamide gel electrophoresis (2-DE) technique to study the mitochondrial proteins. Thereafter, much attention has been paid to separate proteins for analysing plant mitochondrial proteome under stressful condition. Gel-free method of mitochondrial proteome study using nanoscale 1-D and 2-D liquid chromatography (LC) offers advantages (Kristensen et al. 2004; Brugiere et al. 2004; Heazlewood et al. 2003) over gel-based techniques, as it allows separation of highly acidic or highly basic proteins, very high- and very low-molecular-weight proteins as well as low-abundance proteins. Mitochondria have been a target for subcellular proteomics study, as most of the stresses primarily impair mitochondrial electron transport chain resulting in excess ROS generation. Mitochondrial proteomics were studied from soybean root and hypocotyls under flooding stress (Komatsu et al. 2011), the two contrasting wheat cultivars for salinity tolerance at whole-plant level (Jacoby et al. 2010), Arabidopsis for metal homeostasis (Tan et al. 2010), salt stress-induced programmed cell death (PCD) in rice (Chen et al. 2009a, b) and pea under drought, cold and herbicide stresses (Taylor et al. 2005). Mitochondrial proteome has huge importance to evolutionary biologists where it is an important tool to establish the evolutionary linkages owing to its maternal inheritance. Besides, the issues of male sterility are also attributed to the changes in the mitochondrial gene expression.

Extracellular Matrix (Cell Wall) Proteome

Plants, exposed to environmental stress, try to change the composition of cell wall to protect the cellular integrity for prevent mechanical damage. The cell wall or extracellular matrix (ECM) is the first compartment that perceives extracellular signals, transmits them to the cell interior and eventually influences the cell fate decision. Although proteins account for only 10 % of the ECM mass, they comprise several hundreds of different molecules with diverse cellular functions. Increasing evidence suggests that there is continuous crosstalk between the ECM and the cytoskeletal network (Pandey et al. 2010). The combination of ECM extraction and mass spectrometry appears to be a powerful strategy for identification of less abundant and previously unknown protein components involved in different stress responses. Various cell wall proteins have been characterised in Arabidopsis (Bayer et al. 2006; Minic et al. 2007; Jamet et al. 2006, 2008; Zhang et al. 2011), Medicago (Watson et al. 2004; Soares et al. 2007), chickpea (Bhushan et al. 2006), maize (Zhu et al. 2006), rice (Jung et al. 2008; Chen et al. 2009a, b; Cho et al. 2009) and potato (Lim et al. 2012). In addition, many types of stress-associated cell wall proteins have been identified in crops, including flooding stress-induced proteins in soybean (Komatsu et al. 2010) and wheat (Kong et al. 2009); drought stress-induced proteins in rice (Pandey et al. 2010), maize (Zhu et al. 2007) and chickpea (Bhushan et al. 2007); hydrogen peroxide-induced proteins in rice (Zhou et al. 2011); and/or pathogen-induced proteins in maize or tomato (Chivasa et al. 2005; Dahal et al. 2010). Also, cell wall proteins have been studied in wounded Medicago (Soares et al. 2009). Although many proteomics studies of primary cell wall have been conducted in Arabidopsis (Chivasa et al. 2002; Boudart et al. 2005; Jamet et al. 2006, 2008), there have been correspondingly fewer proteomics studies devoted to systematic mapping of the proteins of the secondary cell wall (Millar et al. 2009). The utility of plant secondary cell wall biomass for industrial and biofuel purposes depends upon improving cellulose amount, availability and extractability. The possibility of engineering such biomass requires much more knowledge of the genes and proteins involved in the synthesis, modification and assembly of cellulose, lignin and xylans (Millar et al. 2009).

Chloroplast Proteome

The chloroplasts present in plant and algal cells are believed to have descended from an original cyanobacterial endosymbiont. An important link in the development of this highly specialised organelle has been the gene transfer to the nucleus and evolution of protein-sorting machinery which still ensures the targeting of these proteins to the chloroplast. This information can only be unravelled by studying the proteome distribution inside the chloroplast and therefore the special interest in this organelle. Moreover, this is the organelle involved in carbon fixation. The physiology of the chloroplast can be best studied by the corresponding temporal proteome changes. A homology-based comparison of the chloroplast proteome of Arabidopsis with the total protein complement of a cyanobacterium (Synechocystis) combined with a proteome-wide search for putative chloroplast transit peptides was carried out (Salamini and Leister 2000). The present-day chloroplast was found to be smaller than the cyanobacterial species. The chloroplast proteome has been greatly studied under abiotic stress. Thirty-two differentially expressed chloroplast proteins were found in proteome analysis of soybean chloroplasts responding to ozone stress which revealed downregulation of proteins involved in photosystem I/II and carbon assimilation, and this might be one of the reasons of reduced photosynthetic activity in response to ozone (Ahsan et al. 2010). In contrast, proteins involved in antioxidant defence and carbon metabolism increased under stress. Arabidopsis chloroplast proteome using 2D-DIGE technique resulted in minimal change in the plastid proteomes in cold shock, whilst short-term cold acclimation caused major changes in the stroma but few changes in the lumen proteome. In contrast, long-term acclimation resulted in modulation of the proteomes of both compartments, with appearance of new proteins in the lumen and further changes in protein abundance in the stroma (Goulas et al. 2006). In total, 43 differentially displayed proteins were identified which participate in photosynthesis, other plastid metabolic functions, phytohormone biosynthesis and stress sensing and signal transduction, presumably helping the plant in cold sensing and acclimatisation.

Membrane Proteome

The different organelle membranes play important roles in maintaining the homeostasis within organelles, as well as whole-cell level. Approximately 30 % of the cellular proteome is represented by membrane proteins (Schwacke et al. 2004). The membrane-associated proteins perform unique biological roles in development as well as stress adaptation. The composition and dynamics of membrane proteins reflect their diverse function, and their nature and relative amount vary from one organellar membrane to another. These proteins perform some of the most important functions, like regulation of cell signalling, cell–cell interactions and intracellular compartmentalisation (Wu and Yates 2003). The plant membrane proteome is more complex compared to that of animal cells due to the presence of highly specialised organelles such as plastids and vacuoles. Whilst much progress has been made in animal membrane proteomics, far fewer attempts have been made to characterise the plant membrane proteome (Jaiswal et al. 2012; Nouri and Komatsu 2010, Kawamur and Uemura 2003).

Jaiswal et al. (2012) developed a proteome reference map of chickpea to obtain valuable insight into the dynamic repertoire of membrane proteins, using two-dimensional gel electrophoresis, and 91 proteins were identified by MALDI-TOF/TOF and LC-ESI-MS/MS. These proteins were involved in a variety of cellular functions, viz. bioenergy, stress-responsive and signal transduction, metabolism, protein synthesis and degradation, amongst others. Significantly, 70 % of the identified proteins are putative integral membrane proteins, possessing transmembrane domains.

Nouri and Komatsu (2010) investigated the polyethylene glycol-induced osmotic stress impact on plasma membrane proteome of soybean. Using the gel-based proteomics, four and eight protein spots were identified as up- and downregulated, respectively, whereas in the nanoLC-MS/MS approach, 11 and 75 proteins were identified as up- and downregulated, respectively, under polyethylene glycol treatment. Osmotic stress-responsive proteins, for example, transporter proteins and proteins with high number of transmembrane helices as well as low-abundance proteins, were identified by the gel-free proteomics. Mass spectrometric approach was widely used for identification of putative plasma membrane proteins of Arabidopsis leaves associated with cold acclimation (Kawamur and Uemura 2003). A significant change in protein profile was observed after cold acclimation.

Comparative Proteomics for GM and Non-GM

Transfer of individual genes that encode specific desirable traits into the host, i.e. genetic modification, has become the fastest adopted technology in the history of modern agriculture which has resulted in improvement in agronomic traits, such as resistance to insects, tolerance to herbicides, improved productivity and quality and other traits not present before genetic modification (Garcia-Canas et al. 2011). However, modifications in a plant genome might result in unintended effects, which may affect human health or the environment (Ioset et al. 2007). With the commercialisation of GM crops, these unintended effects are one of the most controversial issues in debating the biological safety of GM crops. A systematic comparative analysis of molecular features of GM crops and their comparators is needed to clarify unintended effects (Cellini et al. 2004; Garcia-Canas et al. 2011). Profiling techniques allow simultaneous characterisation and comparison of the genome, proteome and metabolome of an organism, thus increasing the chances of detecting the inadvertent effects, and have emerged as useful approaches (Kuiper et al. 2003; Ruebelt et al. 2006). Comparative proteomic strategies combined with 2-DE and MS and with liquid chromatography tandem mass spectrometry (LC-MS/MS) have been extensively used to evaluate the effects of genetic modification on the proteomes of lead GM crops: maize, pea, potato, rice, soybean, tobacco, tomato and wheat. These studies involved safety evaluation of GM crops and functional characterisation of GM crops (for review, see Gong and Wang 2013). Corpillo et al. (2004) first assessed the substantial equivalence of GM tomato, resistance to TSWV, using proteomics approaches and found no qualitative or quantitative differences between the GM tomato and the non-GM control. Similarly, DiCarli et al. (2009) demonstrated that expression of scFv(G4) against the CMV coat protein in tomato did not cause pleiotropic effects. A proteomics study of GM bread wheat overexpressing a low-molecular-weight glutenin subunit (LMW-GS) revealed a series of variations, including overaccumulation of the LMW glutenin and downregulation of all other classes of storage proteins, which constituted a compensatory mechanism (Scossa et al. 2008). Horváth-Szanics et al. (2006) used proteomic methods to identify stress-induced proteins in herbicide-resistant GM wheat lines and found changed level of LMW seed proteins and sensitivity to drought stress in this GM wheat under drought stress. Gong et al. (2012) evaluated proteome differences in seeds from two sets of GM rice (Bar68-1 carrying bar and 2036-1a carrying cry1Ac/sck) and their controls by 2-DE differential in-gel electrophoresis (2D-DIGE). To obtain relatively objective data, this study included other rice varieties to evaluate proteome variations related to spontaneous genetic variation, genetic breeding and genetic modifications. GM events did not substantially alter protein profiles as compared with conventional genetic breeding and natural genetic variation (Gong et al. 2012). Agrawal et al. (2013) used 2-DE to study comparative proteomics of entire potato tuber life cycle of wild-type and AmA1 transgenic lines and revealed a role for seed storage protein, AmA1, in cellular growth, development and nutrient accumulation.

Comparative Proteomics Under Abiotic Stress

Changes in protein accumulation under stress are closely interrelated to plant phenotypic response to stress determining plant tolerance to stress. Therefore, studies of plant reaction upon stress conditions at protein level can significantly contribute to our understanding of physiological mechanisms underlying plant stress tolerance. Proteomics studies could thus lead to identification of potential protein markers whose changes in abundance can be associated with quantitative changes in some physiological parameters used for a description of genotype’s level of stress tolerance. In the field of plant abiotic stress research, the most common case is comparison of proteomes isolated from non-stressed (control) plants and the corresponding proteomes upon stress conditions. Other cases include comparison of proteomes from two different genotypes or plant species with contrasting levels of tolerance to a given stress factor.

The studies aimed at comparison of several proteomes are mostly dominated by 2-DE followed by protein identification via MS analysis, although the sole use of MS techniques not only for protein identification but also for protein quantitation is sometimes applied (e.g. Patterson et al. 2007 (14) used iTRAQ for protein quantitation in two barley cultivars with different sensitivities to elevated concentrations of boron). Differential expression proteomics approach is used for description of sets of proteomes differing both in protein quality and quantity, and it is aimed at protein identification and relative quantitation. However, the differential expression proteomics approach (protein identification and quantitation) itself cannot give any information on protein function since one certain protein can reveal very diverse functions depending on its subcellular localisation, posttranslational modifications or interacting partners.

Low-Temperature Stress

For proteome analysis under cold stress, leaf tissues from A. thaliana and A. thaliana cold- and salt-tolerant relative Thellungiella halophila and poplar (Amme et al. 2006; Gao et al. 2009a, b) as well as root tissue from rice (Hashimoto and Komatsu 2007; Lee et al. 2009) or just trinucleate pollen in anthers of rice (Imin et al. 2004), plant embryos from germinated seeds of soybean (Cheng et al. 2010) or plant seedlings of rice (Cui et al. 2005) were employed. Proteome analysis has been carried out at whole-cell level as well as only at organellar level, e.g. A. thaliana nuclear proteome upon cold (Bae et al. 2003) or pea mitochondrial proteome upon cold (Taylor et al. 2005). Most studies have also indicated changes in abundance of enzymes involved in carbohydrate metabolism. Enhanced accumulation of specific dehydration-inducible LEA-II proteins named dehydrins has been repeatedly reported (Kawamur and Uemura 2003; Amme et al. 2006; Degand et al. 2009; Cheng et al. 2010; Vítámvás and Prášil 2008; Vítámvás et al. 2007). Increased levels of RNA-binding protein cp29 have been repeatedly reported (Amme et al. 2006; Gao et al. 2009a, b) as cold significantly affects proteosynthesis. This protein is localised in chloroplast stroma, its activity could be regulated by phosphorylation and it is involved in plastid mRNA processing (Reiland et al. 2009).

Heat

Heat stress is associated with an enhanced risk of improper protein folding and denaturation of several intracellular protein and membrane complexes. Heat-stress response at proteome level has been studied predominantly in rice (Lee et al. 2007), in wheat grain during grain filling period (Skylas et al. 2002; Majoul et al. 2004), in a heat- and drought-tolerant poplar (Populus euphratica) (Ferreira et al. 2006) and also in wild plant Carissa spinarum inhabiting hot and dry valleys in central China (Zhang et al. 2010). In all cases, a heat-induced increase in several HSPs including proteins from HSP100, HSP70 and sHSP families has been observed. Small HSPs belonging to cytoplasmic-located sHSPs as well as mitochondrial-targeted and chloroplast-targeted sHSPs were detected. In heat-treated grains of two genotypes of common wheat with contrasting tolerance to high temperatures, Skylas et al. (2002) detected seven sHSPs unique to a tolerant genotype which have been proposed biomarkers of heat tolerance and drought strength. Another characteristic feature of heat stress is oxidative damage. Upregulation of several enzymes involved in redox homeostasis such as GST, dehydroascorbate reductase (DHAR), thioredoxin h-type (Trx h) and chloroplast precursors of SOD was reported (Lee et al. 2007). Heat stress also induces profound changes in cytoskeleton composition indicating its reorganisation (Ferreira et al. 2006). In addition, an increased accumulation of some eukaryotic translation initiation factors (eIF4F, eIF5A-3) indicates profound cellular reorganisation leading to programmed cell death (PCD) under a long-term heat treatment (Majoul et al. 2004; Zhang et al. 2010).

Drought

Drought stress is associated with a reduced water availability and cellular dehydration. Therefore, changes in cellular metabolism associated with an osmotic adjustment could be expected. Proteome changes upon drought have been intensively studied in poplar (Bogeat-Triboulot et al. 2007; Bonhomme et al. 2009), maize roots (Zhu et al. 2007), soybean roots (Alam et al. 2010a, b) and sugar beet (Hajheidari et al. 2005), and increased levels of several apoplastic ROS-scavenging enzymes, namely, peroxidases involved in enhance cell wall loosening and proteins involved in pathogenesis and stress defence such as polygalacturonase inhibitor proteins, chitinases and osmotin and nodulin precursors were found (Zhu et al. 2007). Dehydration-induced changes in nuclear proteome of chickpea Cicer arietinum and rice Oryza sativa (Pandey et al. 2008; Choudhary et al. 2009) and ECM proteome of chickpea (Bhushan et al. 2007) have also been extensively studied. Proteins involved in carbohydrate and nitrogen metabolism, cell wall modification, signal transduction, cell defence and PCD and proteins involved in redox regulation, oxidative stress, chaperone function and photosynthesis (Rubisco) have also been observed.

Waterlogging

Ahsan et al. (2007) and Alam et al. (2010a, b) have analysed proteome changes as well as changes in in vivo hydrogen peroxide (H2O2) content and lipid peroxidation in tomato leaves and soybean roots, respectively, affected by waterlogging stress. Interestingly, waterlogging has resulted in enhanced levels of H2O2 and lipid peroxidation indicating that this stress factor has an oxidative component. At proteome level, waterlogging induces changes in abundance of proteins involved in several processes, namely photosynthesis, energy metabolism, redox homeostasis, signal transduction, PCD, RNA processing, protein biosynthesis, disease resistance, stress and defence mechanisms.

Salinity

The main crops where effects of salt stress on proteome composition are studied are represented by rice (Abbasi and Komatsu 2004; Kim et al. 2005; Yan et al. 2005; Dooki et al. 2006; Cheng et al. 2009a, b), soybean (Sobhanian et al. 2010) and common and durum wheats (Caruso et al. 2008; Wang et al. 2008) which are all glycophytes. Proteomics experiments carried out on glycophytes also include model plants A. thaliana (Ndimba et al. 2005) and tobacco (Dani et al. 2005). In glycophytes (crops), an increased accumulation of enzymes involved in glycolysis and carbohydrate metabolism (fructose-bisphosphate aldolase, ENO) is regularly observed which indicates an enhanced need for energy (Abbasi and Komatsu 2004; Yan et al. 2005; Sobhanian et al. 2010; Ndimba et al. 2005). Other major group of increased proteins are ROS-scavenging enzymes (APX, DHAR, Trx h, peroxiredoxin, SOD) suggesting an oxidative stress (Abbasi and Komatsu 2004; Kim et al. 2005; Dooki et al. 2006; Ndimba et al. 2005). Proteins involved in nucleotide metabolism (nucleoside diphosphate kinase NDPK, guanine nucleotide-binding protein) and fatty acid metabolism (enoyl-ACP reductase) were also upregulated (Dooki et al. 2006).

Comparative Proteomics for Biotic Stress

Plant pathogens are viruses, bacteria, fungi, oomycetes, protozoans and nematodes. Amongst all, the majority, and most destructive, are fungi and oomycetes (Latijnhouwers et al. 2003). However, the lifestyles and strategies of pathogens are diverse, but for their survival and propagation, at least all must colonise the host and overcome its immune system. Conversely, the host must overcome the virulence of the pathogen if it is to remain healthy. In consequence, coevolution of host–pathogen systems has resulted in a complex interplay of pathogen- and host-derived molecules, resulting in systems with a remarkable degree of conservation (Ronald and Beutler 2010).

Crop–Pathogen Interactions

Proteomics is a logical choice for an investigative tool since any plant–pathogen interaction language use proteins. Two-dimensional gel electrophoresis has been initially used for rapidly identifying major proteome differences in healthy versus inoculated plants. The interaction between Triticum aestivum and Fusarium graminearum (causing Fusarium head blight) (Zhou et al. 2006; Wang et al. 2005), wheat and Puccinia triticina (leaf rust) (Rampitsch et al. 2006a, b), rice and Magnaporthe grisea (rice blast) (Kim et al. 2004a, b), Brassica napus (canola) and Leptosphaeria maculans (blackleg) (Sharma et al. 2008), Brassica oleracea and X. campestris pv. campestris (black rot) (Villeth et al. 2009), Pisum sativum (pea) and Peronospora viciae (downy mildew) (Amey et al. 2008), rice and rice yellow mottle virus (RYMV) (Ventelon-Debout et al. 2004) and grapevine and Flavescence dore’e phytoplasma (Margaria and Palmano 2011) were studied through proteomics. 2-DE reveals only gross changes in the proteome in most of the cases, with common results between diverse pathosystems. Metabolic enzymes showed increased presence in all of the pathosystems; in particular, glyceraldehyde-3-phosphate dehydrogenase was reported to increase in abundance in most of pathosystems. The antioxidant enzymes (Zhou et al. 2006; Wang et al. 2005; Rampitsch et al. 2006a, b; Ventelon-Debout et al. 2004; Margaria and Palmano 2011) especially ascorbate peroxidases, thioredoxin (Zhou et al. 2006; Villeth et al. 2009; Amey et al. 2008), fungal cell wall-degrading enzymes (chitinases and b-glucanases) and other pathogenesis-related proteins (Zhou et al. 2006; Wang et al. 2005; Rampitsch et al. 2006a, b; Amey et al. 2008; Margaria and Palmano 2011) were showed increased abundance for combating the pathogens. In grapevine – Erysiphe necator (powdery mildew) – study, iTRAQ was used to compare protein expression levels in a susceptible grapevine, Vitis vinifera (Cabernet Sauvignon), compared with mock-inoculated controls. The results support the hypothesis that Cabernet Sauvignon is able to initiate a basal defence response but lacks the necessary R-protein(s) to recognise pathogen Avr gene product(s) and therefore succumbs to disease (Marsh et al. 2010).

Interaction with Bacterial Pathogens and Elicitors

Jones et al. (2006a, b) reported early changes to the defence proteome in three subcellular fractions – total soluble protein, chloroplast enriched and mitochondria enriched – after inoculation with three different strains of Pst DC3000 and provided evidence for the rapid communication between organelles and regulation of primary metabolism through redox-mediated signalling. Jones et al. (2006a, b) identified six differentially phosphorylated proteins robustly changing between a mock-inoculated control, HR and a basal defence response in soluble A. thaliana leaf extracts following bacterial challenge, using phosphoprotein affinity enrichment coupled to relative quantification with iTRAQ. This study highlights the reproducibility, utility and problems associated with the quantitative analysis of changes in the complex phosphoproteome from intact green leaf tissue. Casasoli et al. (2008) analysed A. thaliana seedling apoplastic proteins elicited by oligogalacturonides (OGs) that accumulate upon challenge by pathogenic microorganisms, using 2-D DIGE and many differentially expressed or posttranslationally modified apoplastic proteins that were identified with either putative defensive roles or with structural features typical of proteins involved in recognition. These findings confirm the role of the cell wall as the first line of defence against pathogens as well as a source of molecules important in plant protection, which help in perception of pathogens. The biotic interactions in the rhizosphere during the communication between the roots of two plants Medicago sativa and A. thaliana and microbes P. syringae pv. tomato DC3000 or Sinorhizobium meliloti strain Rm1021 were studied, which revealed a specific, protein-level crosstalk between roots and microbes. It was suggested that secreted proteins may be a critical component in the process of signalling and recognition that occurs between compatible and incompatible interactions (De-la-Pena et al. 2008). The identification of signalling processes and phosphoproteins at the plasma membrane was addressed in large-scale global analyses of protein phosphorylation in model systems with elicitors. Nühse and co-authors used trypsin to digest cytoplasmic face-out vesicles and then enriched phosphopeptides by strong anion exchange (SAX) plus immobilised metal ion affinity chromatography (IMAC) and nanoLC-MS/MS as a strategy for large-scale phosphoproteomics of the plasma membrane from A. thaliana suspension cells stimulated with flg22. This identified over 300 phosphorylation sites on approximately 200 putative plasma membrane proteins (Nühse et al. 2003, 2004). In addition, more than 50 sites were mapped on receptor-like kinases revealing an unexpected complexity of the phosphorylation sites’ characteristics and regulation. The isotopic quadruplex iTRAQ labelling of peptides was used to achieve quantification of dynamic protein phosphorylation in the same model system of A. thaliana cells challenged with flg22 (Nühse et al. 2007).

Plant–Fungus Interaction

Over the past decade, proteomics studies have contributed new knowledge to the M. grisea–rice interaction. Indeed, the first descriptive proteomics study of a pathogen-infected host plant focused on this interaction (Konishi et al. 2001). Recent experimental evidence based on differential display analysis of elicitor-responsive proteomes between two rice near-isogenic lines and M. grisea glycoprotein elicitor suggested that the incompatible rice line may possess a more sensitive recognition system that can identify and react to specific chemical, biological or physical triggers in a more efficient manner, thus eliciting an early and fast defence response (Liao et al. 2009). The authors also examined extracellular phosphorylation and identified phosphoproteins in both the cell wall (putative lectin receptor-like kinase and endochitinase) and culture filtrate (xyloglucan endo-1,4-b-D glucanases) in chitosan treatment in A. thaliana cell suspension cultures, supporting the view that an extracellular kinase activity might be present in plants and an extracellular phosphorylation network could be involved in intercellular communication (Ndimba et al. 2003). Proteomic analysis of chitosan-treated V. vinifera cv. Barbera cell suspensions revealed the upregulation of both stilbene and flavonoid pathways, with the resultant production of a wide spectrum of polyphenol antioxidant compounds (Ferri et al. 2009). The proteome changes during the interaction of the model legume M. truncatula cells in suspension culture with a pathogen-derived yeast invertase elicitor (YE) and suppressor using Sinorhizobium meliloti lipopolysaccharide (LPS) were studied using 2-DE and LC-MS/MS, which revealed upregulated proteins involved in defence only after YE but not LPS treatment (Gokulakannan and Niehaus 2010). The study of an incompatible plant–fungal interaction, the A. thalianaA. brassicicola host–pathogen pair, showed that at least 11 proteins showed reproducible differences in abundance by 2-DE, increasing or decreasing during the progress of the infection. It was demonstrated that the leaf can limit pathogen infection whilst keeping its overall activity largely intact (Kaschani et al. 2009). Differential proteomics study for elicitor-induced sanguinarine biosynthesis in opium poppy cell cultures treated with B. cinerea fungal homogenate was done under controlled conditions which provided a platform to characterise the induction of antimicrobial alkaloid biosynthesis and other plant defence pathways (Zulak et al. 2009). The abundance of chaperones, heat shock proteins, protein degradation factors and pathogenesis-related proteins provided a comprehensive proteomics view on the coordination of plant defence responses. The elicitor-induced metabolic enzymes represented the largest category of proteins and included S-adenosylmethionine synthetase, several glycolytic enzymes, a nearly complete set of TCA cycle enzymes, one alkaloid and several other secondary metabolic enzymes.

Comparative Proteomics for Plant Development

Proteomics is an important tool for the analysis of proteins in organisms at the level of organs, cell populations and subcellular compartments under diverse developmental conditions. The number of plant developmental studies using various proteomics approaches is steadily growing. Considerable experimental effort was devoted to the proteomic investigation of hormonal pathways regulating plant development such as brassinosteroid signalling (Tang et al. 2010), auxin signalling (Shi et al. 2008), cytokinin regulation (Xu et al. 2010; Lochmanová et al. 2008), cell proliferation and elongation, cell differentiation and leaf, root, shoot and other plant organ development etc.

Cell Proliferation and Elongation

Proteomics studies in Medicago truncatula showed protein expression changes primarily in the cell division-related processes such as metabolism, energy housekeeping or the control of protein synthesis. Further, the stress-related proteins preferentially accumulate dividing tissues, such as root meristem (Holmes et al. 2006) and proliferating protoplasts (De Jong et al. 2007). In both cases, mainly pathogenesis-related proteins, such as PR-10 and heat shock proteins, exhibited higher abundance in dividing tissues. In another proteomics study of a transcription factor NTM (for NAC with transmembrane motif 1) mutant line in Arabidopsis, elevations of beta-glucosidase homolog 1 and annexin expression were found altered and exhibiting reduced cell division rate (Lee et al. 2008). Different proteomics studies showed that differential regulation of annexins is also linked to other plant developmental processes including pollen germination (Dai et al. 2007), cotton fibre elongation (Zhao et al. 2010) and somatic embryogenesis (Gómez et al. 2009). These findings also confirm a functional role of some ROS-related proteins such as ascorbate peroxidase (Holmes et al. 2006), dehydroascorbate reductase, glutathione transferase (Lee et al. 2008) and mitochondrial manganese superoxide dismutase (Shi et al. 2008) activity in the cell division regulation. The role of vigorous actin and microtubule cytoskeleton dynamics in cell expansion is also reflected in proteomics studies (Chan et al. 2007). Two independent comparative studies showed downregulation of alpha-tubulin, beta-tubulin and tubulin-folding cofactor A and profiling in mutant cotton fibres with inhibited elongation (Zhao et al. 2010; Pang et al. 2010). Similar results showing upregulation of five actin and two beta-tubulin isoforms were obtained during fibre elongation (Yang et al. 2008). High-throughput proteomics study on Lilium longiflorum pollen grain membrane proteins provided valuable contribution to the elucidation of pollen tube polar growth (Pertl et al. 2009). Remarkably, the expression levels of proteins involved in membrane/protein trafficking (Rab 11b GTPase, V-type ATPase and the H+ pyrophosphatase) raised simultaneously with proteins involved in signal transduction, stress response, protein biosynthesis and folding, during the germination of pollen grains. In contrast, proteins involved in cytoskeleton, carbohydrate and energy metabolism and transport of ions were upregulated earlier, when the pollen just started to germinate (Pertl et al. 2009).

Cell Differentiation

In a cell differentiation study, protein profiling of seed-derived calli on different regeneration media with different relative concentrations of cytokinin and auxin showed differences mainly in carbohydrate and energy metabolism and stress-/defence-related proteins (Yin et al. 2008). Interestingly, these protein groups were also activated in Vanilla planifolia calli directed for shoot organogenesis (Palama et al. 2010). In addition to cell differentiation, it is possible to reprogram differentiated cells to retain the competency of cell division and organ regeneration by using particular external hormone composition. Kinetin and 2,4-D induced a dedifferentiation of Arabidopsis cotyledon cells and was accompanied by protein phosphorylation (Chitteti and Peng 2007a, b). This hormonal treatment induced also protein synthesis, changes in the chromatin structure, cytoskeleton reorganisation and prevalent downregulation of chloroplast proteins (Chitteti et al. 2008). Various proteomics approaches were applied to study somatic embryogenesis of diverse plant species such as cassava (Manihot esculenta (Baba et al. 2008; Li et al. 2010)), oak (Quercus suber (Gómez et al. 2009)), Valencia sweet orange (Citrus sinensis (Pan et al. 2009)), grapevine (Vitis vinifera (Marsoni et al. 2008)) and Vigna unguiculata (Nogueira et al. 2007). These reports included studies on protein expression changes during somatic embryogenesis and comparative studies between embryogenic and nonembryogenic calli as well as between gametic and somatic embryogenesis.

Seed Germination

Extensive effort was also dedicated to the proteomic investigation of seed germination. In a study of comparison of the endosperm cap proteome of ABA-inhibited vs. non-inhibited germinating cress (Lepidium sativum), seeds showed specific, ABA-responsive, early germination processes, such as lipid mobilisation, energy production, proteolysis and increase in abundance of antioxidant enzymes (Müller et al. 2010) These data suggested that the cress endosperm cap is not a storage tissue similar to cereal endosperm. Instead, it is a metabolically very active tissue regulating the rate of radicle protrusion. The changes in the proteome of rice embryo during germination (Kim et al. 2009a, b) revealed that enzymes detoxifying reactive oxygen species, protein degradation proteins and cytoskeleton-associated proteins play an important role during seed germination. The data of some studies suggest that protein phosphorylation plays an important role in seed germination. One of these studies on protein phosphorylation during maize seed germination revealed 39 protein kinases, 16 phosphatases and 33 phosphoproteins containing 36 phosphorylation sites (Lu et al. 2008). At least one-third of these phosphoproteins represented key components involved in biological processes like DNA repair, gene transcription, RNA splicing and protein translation related to the seed germination.

Seed Development

Seed development studies in Brazilian pine highlighted an active oxidative stress metabolism (ascorbate peroxidase as well as peroxiredoxin) in early seed development along with higher abundance of enzymes involved in cell wall expansion (alpha-xylosidase and type IIIa membrane protein cp-wap13) (Balbuena et al. 2009). Storage proteins (e.g. vicilin-like storage protein) and proteins involved in respiration (triosephosphate isomerase, fructose-bisphosphate aldolase and isocitrate dehydrogenase) were accumulated in the later stages of seed development. The upregulation of glutamine synthase during the early cotyledonary stage indicated active biosynthesis and conversion of glutamine to glutamic acid Balbuena et al. 2009). Similarly protein expression profiles in endosperm and embryo proteomes of dry seed of Jatropha curcas indicated some similarities in metabolic pathways between them. However, embryos generally possess proteins mainly involved in anabolic processes and accumulate stress-related proteins, implying increased embryo requirements for protection against stress (Liu et al. 2009).

Plant Organ Development

Nozu et al. (2006) studied developmental changes in root, stem and leaf proteomes in rice during the first 10 weeks after budding and showed that 19 proteins were present in all developmental stages in all tissues which represent metabolic proteins as well as oxidative stress-related proteins such as catalase isozyme A, superoxide dismutase ascorbate peroxidase and peroxiredoxin. Another study in soybean showed that protein transport regulatory proteins, especially those involved in the transport of nuclear-encoded chloroplastic protein into chloroplasts, were presumably involved in leaf development and maturation (Ahsan and Komatsu 2009).

The mechanisms of maize lateral and seminal root formation were extensively studied by comparative proteomics approaches using maize mutant lines. The rum1 (rootless with undetectable meristems 1 (Saleem et al. 2009)) mutant line is altered in both seminal and lateral root formation, whilst the rtcs (rootless concerning crown and seminal roots (Muthreich et al. 2010)) line does not form seminal roots. The comparison of rtcs and wild-type maize embryos showed that changes in disulphide isomerase expression which is involved in protein folding, as well as embryonic protein DC-8, generally seem to have a role in various pathways essential for the formation of different root types (Muthreich et al. 2010). In addition, the proteomics study on rum1 transgenic line revealed that the proteins related to pyridoxine biosynthesis are involved in rum1-dependent pathway of root formation (Saleem et al. 2009). Proteome changes during bud development (Bi et al. 2010) were elucidated in Pinus sylvestris L. var. mongolica in order to study mechanisms of bud dormancy induction and release. Stress-induced ascorbate peroxidase, pathogenesis-related proteins and heat shock proteins were involved in bud dormancy induction, and the proteins involved in protein synthesis, cell wall biogenesis and cytoskeleton were upregulated during dormancy release. The comparison of flower and bud proteomes suggested that sucrose generation derived by upregulated phosphoglucomutase and downregulated glycoprotein could serve as an inducer of flavonoid- and anthocyanin-related genes important for petal growth and colour development in mature flower.

The proteomics approach was found to be powerful for the investigation of potato (Solanum tuberosum) tuber formation (Agrawal et al. 2008; Lehesranta et al. 2006; Fischer et al. 2008) along with root, leaf and flower development. Changes in the proteome during tuber initiation and growth reflect mainly the processes connected to the accumulation of storage reserves and starch synthesis. Thus, storage proteins, protease inhibitors and proteins involved in secondary metabolism were upregulated during tuber growth. Additionally, some isoforms of patatins, a large family of primary storage proteins, were shown to accumulate in non-swelling stolons, possibly indicating their involvement in tuber initiation (Agrawal et al. 2008; Lehesranta et al. 2006).

Proteomic investigations of corn rachis, which delivers essential nutrients to the developing kernels in maize early during maturation (25 vs. 50 days after silking), revealed significantly increased expression (2.4- to 14.5-fold) of many stress-/defence-related proteins in mature rachis. They included PRm3 (class III chitinase), PR-1, PR-10, beta-1,3-glucanase, endo-1,3-beta-glucanase, germin-like protein subfamily 1 member 17, permatin and Asr protein (Pechanova et al. 2011). Additionally, profilin, an actin-binding protein which regulates actin polymerisation (Staiger et al. 2010), was also upregulated during rachis development and maturation. Previous proteomics study revealed that an inhibition of pollen tube tip growth by latrunculin B (an inhibitor of actin polymerisation) was well correlated with downregulation of profilin (Chen et al. 2006). Recently, profilin2 was identified by proteomics and cell biology approaches as a new cytoskeletal protein modulating vesicular trafficking in Arabidopsis roots (Takáč et al. 2011).

Fruit Ripening

Fruit ripening is a developmental complex process which occurs in higher plants and involves a number of stages displayed from immature to mature fruits that depend on the plant species and the environmental conditions. Due to the huge amount of metabolic changes that take place during ripening in fruits from higher plants, the accomplishment of new throughput methods which can provide a global evaluation of this process would be desirable. Differential proteomics of immature and mature fruits would be a useful tool to gain information on the molecular changes which occur during ripening, and also the investigation of fruits at different ripening stages will provide a dynamic picture of the whole transformation of fruits.

The 2-DE of tomatoes in the different ripening stages was analysed for changes in proteome composition. The results showed that an overall intensity increase during ripening was detected in 26 spots, whereas a decrease was seen in 27 spots, and two spots reached their maximum at the breaker or light-red stage (Kok et al. 2008). One important fruit ripening-related protein acid beta-fructofuranosidase was found to be upregulated in the breaker stage, downregulated in the subsequent turning and light-red stages and then once again upregulated in the red stage of ripening (Kok et al. 2008). Parallel studies carried out in three different ripening stages of tomato (unripe, medium ripened and fully ripened) resulted in the identification of pectin esterase and heterotrimeric GTP-binding protein fragment homologous to tobacco (Schuch et al. 1989), which might be implicated in cell wall softening and changes in firmness and are proposed as the ripening specific markers in tomato, since their levels were upregulated during tomato ripening (Schuch et al. 1989). However, the majority of proteins that were characterised corresponded to genes known to be regulated during tomato fruit development. Proteome maps obtained at three stages of ripening were compared to assess the extent to which protein distribution differs in grape skin during ripening. The comparative analysis showed that numerous soluble skin proteins evolved during ripening and revealed specific distributions at different stages. Proteins involved in photosynthesis (Rubisco), carbohydrate metabolism (aconitate hydratase, transketolase, phosphoenolpyruvate carboxylase, oxalyl-CoA decarboxylase and aldehyde dehydrogenase) and stress response (HSP17.7) were identified as being over-expressed at the beginning of colour change (Deytieux et al. 2007). At harvest, the dominant proteins were involved in defence mechanisms. In particular, increases in the abundance of different chitinase and β-1,3-glucanase isoforms were found as the berry ripened. This observation could be correlated with the increase of the activities of both of these enzymes during skin ripening. Thus, the differences observed in proteome maps clearly showed that significant metabolic changes occur in grape skin during this crucial phase of ripening (Deytieux et al. 2007).

Posttranslational Modifications

Often low abundance and/or low concentration including reversible and labile nature of many PTMs create a multifaceted challenge for the analysis of PTMs such as phosphorylation, glycosylation and cysteine oxidation. For improved recognition and site depiction, some novel MS/MS fragmentation strategies such as selective enrichment, electron capture/transfer dissociation (ECD/ETD) and derivatisation/labelling have been used. In addition to it, changes to the analytical setup such as negative ion mode and the use of nonstandard, from time to time basic sample solutions have also been employed. All these are crucial for quantitative plant proteomics where PTMs are playing a substantial role.

Phosphorylation

Protein phosphorylation is indeed an imperative PTM in plants, as well as in animals, involved in various cellular processes. Presence of around 1,000 and 500 protein kinases in Arabidopsis and human, respectively, and the recent identification of numerous phosphopeptides in large-scale plant phosphoproteomics further strengthen the importance of phosphorylation as well (Huang et al. 2009; Van Bentem et al. 2008; Reiland et al. 2009). Therefore, the quantitation of phosphoproteomes is of utmost importance to unravel the molecular mechanism behind the cellular processes such as signalling pathways, since phosphorylation and dephosphorylation may be perhaps the initial signalling events, triggering a chain of downstream signalling cascades which ultimately culminates into the differential expression of gene(s).

The tendency of the acidic phosphate group(s) to lower the pI of proteins in 2-DE makes it more suitable approach to study phosphorylation changes since protein isoforms can be resolved. Phosphorylated proteins can be particularly detected using nonradioactive stains such as Pro-Q Diamond (Gerber et al. 2006, 2008; Chitteti and Peng 2007a, b; Boudsocq et al. 2007) and alternatively by incorporated radiolabeled 32P (Rampitsch et al. 2006a, b). The specific use of Pro-Q Diamond is for the quantitation and identification of differentially regulated phosphoprotein isoforms in tobacco elicitation (Gerber et al. 2006) and in the cellular dedifferentiation of Arabidopsis (Chitteti and Peng 2007a, b), including the analysis of Arabidopsis cells under osmotic stress or ABA-dependent stress (Boudsocq et al. 2007). In addition to it, for the enrichment and quantification of phosphoproteins, Pro-Q Diamond can be used as a purification tool (Ito et al. 2009). Nonetheless, quite often, 2-DE protein spots cannot be used to establish the site of phosphorylation, especially in the case of membrane proteins, which includes several phosphoproteins.

Low stoichiometry and competitive tendency for ionisation due to the presence of non-phosphorylated peptides in vicinity often necessitate the enrichment of phosphopeptides for MS-based phosphoproteomic analysis. Several techniques thus so far are available for selective enrichment (Dunn et al. 2010), particularly in plant proteomics, and several of these techniques have also been used such as immobilised metal affinity chromatography (IMAC) (Grimsrud et al. 2010) and metal oxide affinity chromatography (MOAC) (Hsu et al. 2009) alone or in combination (Sugiyama et al. 2008).

The best employed method prior to any selective enrichment is chemical labelling since phosphoprotein/phosphopeptide enrichment steps can add significantly to the technical bias in quantitative analysis. A recent large-scale phosphoproteomic SILAC study of mouse liver also indicates the preference of metabolic labelling over phosphoprotein/phosphopeptide enrichment steps (Pan et al. 2008). Likewise, the most appropriate quantitation technique for plant phosphoproteomics is the metabolic labelling using 15N salt (Oda et al. 1999), as observed in the case of Arabidopsis cells treated with the flagellin bacterial elicitor flg22 and the fungal elicitor xylanase (Benschop et al. 2007). More than 1,000 phosphopeptides from plasma membrane fraction were quantified in this study, and out of that, 76 and 9 phosphopeptides were differentially regulated following flg22 and xylanase elicitation. Nevertheless, in a very similar study of Arabidopsis, cells were treated with the flg22 elicitor, and quantitation with iTRAQ was chosen over 15N metabolic labelling due to its multiplexing capabilities. In this investigation, due to the more precise analysis, considering only the ratios with at least a twofold difference, the number of differentially phosphorylated peptides was restricted, i.e. only 12 phosphopeptides were induced (Nuhse et al. 2007). However, the consistency of the data between both the studies implies the identification of the relevant phosphorylation sites.

Redox Proteomics

Environmental stimuli significantly influence the redox status of proteins, predominantly in biotic and abiotic stresses as an oxidative burst associated with the production of reactive oxygen species (ROS) is mainly induced by it (Jaspers and Kangasjarvi 2010; Torres 2010). Usually, cytoplasm is reductive in optimal conditions which favour the reduction of sulfhydryl groups. Nevertheless, cysteines are worst affected amongst the others due to ROS production by the formation of disulphide bonds, unstable sulfonate groups or the irreversible sulfinic or sulfonic acids, even though other residues can also be oxidised. Therefore, to study the redox status of proteins and to quantify reduced cysteines on cysteine-containing peptides, chemical labels that target cysteinyl groups can be used. For the quantification of the reduced cysteines with fluorescent labels, such as monobromobimane (mBBr), cyanide-5-maleimide (cy5m) or CyDyes and DIGE, subsequently protein isoform separation on conventional 2-D gels can be used. Diagonal 2-D native SDS-PAGE is also an alternative (Yano and Kuroda 2008; Stroher and Dietz 2006). A comparative analysis can be possible between the native fluorescent labelled samples with reduced cysteinyl groups and with samples that have been fully reduced by DTT or tris(2-carboxyethyl) phosphine (Fu et al. 2008; Hurd et al. 2009). Labelling of the free SH groups with ICAT reagents can also be a method of choice which can allow a gel-free quantitative study of the redox proteome in plants (Stroher and Dietz 2006; Hagglund et al. 2008, 2010). However, using sequential nonreducing/reducing 2-D SDS-PAGE, redox proteomics has also been undertaken without labels (Cumming et al. 2004). Several of the thioredoxin (Trx) targets or the related glutaredoxin (Grx) targets have been investigated in various redox plant proteome studies (Rinalducci et al. 2008). Both Trx and Grx are involved in the reduction of disulphide bonds in proteins (Montrichard et al. 2009; Rouhier 2010). The redox proteome studied in Trx-linked reactions during seed germination is commendable one (Yano and Kuroda 2006; Alkhalfioui et al. 2007).

During oxidative stress, one of the most common PTMs is the protein carbonylation by aldehyde or ketone formation on Lys, Arg, Pro or Thr side chains (Rinalducci et al. 2008). Quantification of carbonyl groups is possible by derivatizing them with 2,4-dinitrophenylhydrazine (DNPH) and detecting the DNP adducts with DNP monoclonal antibodies (Tanou et al. 2009). On the other hand, a hydrazide biotin-streptavidin enrichment methodology allows high-throughput findings of carbonylated proteins by MS (Soreghan et al. 2003). A detailed proteomics study of citrus and apple plants subjected to salinity stress and senescence, respectively, revealed a surge of carbonylation events in plant proteomes (Qin et al. 2009). Non-MS-based affinity detection and quantitation techniques coupled with 2-DE are used in these studies.

Nitrosylation

Nitric oxide is a well-established signal molecule involved in plant stress response and development and, to some extent, as ROS as well (Lindermayr and Durner 2009; Qiao and Fan 2008). Stress due to nitric oxide leads to the formation of nitrosylated cysteines or nitrated. Methods such as “biotin switch” and “affinity purification” have also been devised to exclusively target and enrich proteins containing nitrosylated cysteines (Lindermayr and Durner 2009; Torta et al. 2008). For quantitative proteomic analysis of nitrosylated cysteines using straightforward SYPRO Ruby staining intensity values, “biotin switch” enrichment method has also been used coupled with 2-DE and MS (Romero-Puertas et al. 2008). Differentially nitrosylated proteins in HR and subsequent programmed cell death (PCD) in Arabidopsis due to the infection with an incompatible bacterial pathogen have been investigated in this study. Similarly, to allow a straightforward quantitation of this PTM, methods have been developed to detect o-nitrotyrosine using iTRAQ reagents (Chiappetta et al. 2009). In sunflower hypocotyls, nitrotyrosine antibodies were used for the detection and quantitation of nitrotyrosine by Chaki et al. (2009). 1-D and 2-D Western blots were used to detect differential nitration following treatment leading to HR in tomato cells by Cecconi et al. (2009). An increase in nitrosylated proteins following salinity stress in citrus plants has been observed by Tanou et al.. However, no nitrated proteins were identified by MS in these studies.

S-Glutathionylation

Glutathionylation, a type of PTM, is an eminent result of cysteine modification. Glutathionylation is a consequence of oxidative or nitrosative stress and is perhaps involved in cellular signalling (Dalle-Donne et al. 2007). Different methods are nowadays available for the detection of this PTM; for instance, 35 S-glutathione labelling, 2-DE separation and biotin-glutathione affinity purification are used to distinguish induced glutathionylation levels of Arabidopsis proteins subjected to oxidative stress (Dixon et al. 2005; Gao et al. 2009a, b).

Unravelling Signal Transduction Cascades Using Proteomics Approaches

Signalling processes usually involve direct physical contacts between different components in a pathway, in order to transfer a “signal” from receptors to transcription factors or other intracellular effector proteins. Combinatorial interactions between signalling proteins can be crucial for determining their cell type-specific functions, subcellular localisation and stability. Therefore, the identification of protein complexes and posttranslational modifications of signalling proteins are essential to understand signal transduction cascades. The signal is often transmitted from receptors via phosphorylation of intermediate and effector proteins. Protein phosphorylation ensures fast and reversible response to different stimuli. Proteomics approaches are being used to study changes in phosphorylation in response to variation in light or temperature (Bonardi et al. 2005; El-Khatib et al. 2007), invasion of pathogens (for review, see Quirino et al. 2010), hormones (El-Khatib et al. 2007; Li et al. 2009; Chen et al. 2010) and salt stress (Chitteti and Peng 2007a, b). An alternative commonly used mechanism for signal transduction is the targeting of repressor proteins for degradation via ubiquitylation (for review, see Vierstra 2009).

Tandem affinity purification (TAP) approaches, Strep-tags and biotin tags have been successfully used in plants. Alternatively, protein fusions to green fluorescent protein (GFP) are being used, which allow the direct visualisation of the protein expression and subcellular localisation in plants (Karlova et al. 2006). Combination of affinity purification and separation by size exclusion and/or blue native PAGE potentially enables the detection of distinct complexes formed by one protein (Remmerie et al. 2009). Recently, the first systematic proteomics efforts to unravel “interactomes” of specific signalling processes have been accomplished. Proteins of the 14-3-3 family are components of many signalling pathways and bind to a wide variety of client proteins in a phosphorylation-dependent manner. Chang et al. (2009) performed TAP-tag purification of a generic subunit of 14-3-3 protein complexes that was expressed from a constitutive promoter. Complex partners were identified by a quantitative, MudPIT-based strategy. This approach revealed 101 new potential 14-3-3 clients, indicating that 14-3-3s are some of the most connected nodes in the emerging protein–protein interaction network of plants. Another recent proteomics study characterised the core cell cycle interactome in Arabidopsis cell cultures; complex partners of 102 cell-cycle associated proteins, constitutively expressed as fusion to an improved version of the TAP tag (GS-tag), were isolated (Van Leene et al. 2010).

Plant Proteomics: Challenges and New Frontiers

Plant proteomics as a discipline has grown multitudes after the release of the model dicot genome sequence of Arabidopsis (Arabidopsis Genome Initiative 2000) and the monocot genome of rice (Goff et al. 2002). There has been a significant improvement in plant-specific protocols from sample extraction to mass spectrometric analysis. A significant challenge in proteomics when studying plants or any complex biological system is the inability to measure the entire proteome (Ahn et al. 2007). Although limiting, a number of approaches have been used to partially overcome these restrictions. This includes sample fractionation and the enrichment of protein subpopulations or compartments prior to sample analysis by mass spectrometry (Eubel et al. 2008; Huang et al. 2009; Hynek et al. 2009; Ferro et al. 2010).

Techniques involving quantitation by mass spectrometry are now led by the next-generation label-free techniques (Schulze and Usadel 2010). The utilisation of unlabeled targeted approaches (selected reaction monitoring, SRM) has greatly improved sample sensitivity and reproducibility by mass spectrometry (Lange et al. 2008). The ongoing advancement of MS instrumentation and approaches such as sequential window acquisition of all theoretical fragment ion spectra (SWATH; Gillet et al. 2012) has enabled current researchers to employ a wider range of methodological approaches. The label-free technique relies upon the uniqueness of a peptide sequence being monitored which is only a possibility in species with well-characterised genomes such as Arabidopsis or rice (Rost et al. 2012). However, in other plants, this remains a challenge, as without complete genome sequences, confidence in a peptide’s uniqueness is limited.

Another challenge is to integrate the available proteomic data sources and create community plant resources to create a web of interlinked repositories. Plant research has significantly advanced the field of proteomics by overcoming plant-specific challenges and by contributing to the development of plant-specific-related technologies and analyses. Recently, a coordinated effort was made to create an aggregation portal to summarise the varied Arabidopsis proteomic resources in a single interface that was introduced (Joshi et al. 2011). Such integrated approaches are to be fostered for the future of data management and analysis. This resource is significant in that it represents the first example of proteomic data unification by a variety of specialty research groups. Whilst proteomics research in plants will be greatly supported by general advances in the field, there still remains many specific problems that will ultimately require tailored solutions for plant research.

Summary

The primary objectives of plant proteomics in general remain: (1) to get insight into the physiology of different plant species, varieties and their performance towards development parameters, yield indices, pathogen response, abiotic stress management, fruiting etc., (2) to develop improved and safe crops to meet the goals of food security and (3) to develop sustainable agriculture practices and reduce the impact of agriculture on the environment. Proteomics research is the need of the hour and essentially required to integrate the genomic codes to the functional applications. There has been a tremendous development in the technology of proteome analysis from gel-based basic tools to the current quantitative MS/MS-based automated platforms. There has been an exemplary rise in the proteomics studies in the post-genomic era ranging from applications in crop improvement, posttranslational modifications to understanding the natural processes. However, the applications of the proteomic applications need to integrate with the systems biology approach. The genome has limited meaning without a proteome complement which further can only be fully understood by functional characterisation or understanding the metabolome. A broader, interdisciplinary global network should combine multiple strategies simultaneously to integrate the advances in plant biotechnology to reach the larger objective of food security and sustainable development.