Keywords

1 Introduction

Reproduction is one of the most important processes performed by living organisms on Earth, which enables the survival of species. Plant reproduction has several important aspects, some of which differ from animal reproduction. In plants (Archaeplastida), there are two altering generations that differ in their ploidy. The diploid sporophyte produces haploid spores by meiosis, and the spores give rise to haploid gametophyte, in which gametes are produced by mitosis (unlike animals, gametes of which are formed by meiosis). Two haploid gametes fuse together, forming a diploid zygote, from which a new sporophyte generation develops.

The ancestors of higher plants, Streptophyta, spent a vast majority of their lives as haploid gametophytes (reviewed in Qiu et al. 2012). Their zygote that was formed by two fusing gametes immediately underwent meiosis and gave rise to four haploid spores. In their descendants, meiosis was delayed, and a multicellular diploid generation was formed by several rounds of zygote mitosis. During subsequent evolution, one of the generations tended to be reduced. In bryophytes (Bryophyta sensu lato)—i.e. mosses, hornworts and liverworts—gametophyte dominates, and sporophyte is dependent on it. On the other hand, sporophyte is dominant in ferns (Monilophyta) and lycophytes (Lycopodiophyta), but they still form gametophytes as an independent generation. Later, in gymnosperms (Gymnospermae) and angiosperms (Angiospermae), gametophytes were notably reduced. In gymnosperms, the male gametophyte contains 4–40 cells, whilst the female gametophyte comprises several thousands of cells. Angiosperm gametophytes are even more reduced, having male gametophyte composed of 2–3 cells, whereas female gametophyte comprises typically 7 cells with 8 nuclei (although there exist several other alternative arrangements with different number of cells in the embryo sac; Reiser and Fischer 1993).

The mature angiosperm plants belong to diploid sporophyte generation. Angiosperms produce spores, gametophytes and gametes of separate sexes—male and female. The male gametophyte is formed inside the anthers of a flower; two initials differentiate from the sporophytic tissue in the anther—tapetal initial (which gives rise to tapetum) and pollen mother cell (microsporocyte; see McCormick 1993). A pollen mother cell divides into four microsporocytes by meiosis. The microspore tetrad is first connected by callose, which is later digested by the activity of callases (enzymes digesting callose) released by the tapetum. The freed microspores increase their size and vacuolise, and their nuclei migrate to the periphery (McCormick 1993; Borg and Twell 2010). The microspores then undergo highly asymmetric pollen mitosis I (PMI) leading to the production of a large vegetative cell and a small generative cell. The generative cell, the origin of male germline, is then engulfed by the vegetative cell and migrates into its cytoplasm (Berger and Twell 2011). The asymmetry of PMI is of key importance, which was proven by the serious phenotypic defects of Arabidopsis thaliana mutants gemini pollen 1 (Park et al. 1998) or two-in-one (Oh et al. 2005). The generative cell undergoes one more round of cell division, pollen mitosis II (PMII), which occurs either before mature pollen is discharged or afterwards. Two sperm cells are formed from a generative cell by PMII. Consequently, the mature pollen can be shed as bi-cellular or tri-cellular (Brewbaker 1967).

Initially, the ancestral state of angiosperm pollen grains was inferred as bi-cellular, because all ancient woody Magnoliales shed this pollen type. Moreover, this bi-cellular pollen is phylogenetically widespread, and it was believed that tri-cellular pollen was restricted to aquatics, grasses and some herbs (Elfving 1879; Strasburger 1884). More recent investigations revealed that pollen tri-cellularity is not irreversible and that tri-cellular lineages diversify slowly and sometimes reverse to bi-cellular lineages. This reflects a linkage between the evolution of sporophyte lifestyle and developmental lability of male gametophyte (Williams et al. 2014a). The ancient aquatic plant Ceratophyllum and several monocot lineages, such as Araceae, Alismataceae and Nymphaeaceae, support the tri-cellular ancestry of pollen, which in this case represents a selective advantage over bi-cellular pollen. Williams et al. (2014a) proposed that bi-cellular pollen evolved secondarily from tri-cellular ancestors during shifts away from rapid life cycle or from limited reproduction. In total, thirteen orders of angiosperms shed prevalently tri-cellular pollen (Fig. 10.1). In the orders with bi-cellular pollen, about 2–44% species produce also tri-cellular pollen.

Fig. 10.1
figure 1

Distribution of tri-cellular and bi-cellular pollen within angiosperms. Interrelationships are based on APG IV (APG IV, 2016). Percentages summarise all taxa with tree species matches from Williams et al. (2014a). Tri-cellular pollen is dominant in blue-coloured orders. Boxes indicate that some of the omics is already done for the order

The simultaneous presence of both pollen types in one species is very uncommon. For instance, the coexistence of bi- and tri-cellular pollen grains at the same time is in early-divergent angiosperm Annona cherimola (Magnoliales, Annonaceae). There, the production of the actual pollen type depends on environmental factors such as temperature and humidity during the pollen maturation (Lora et al. 2009).

Upon reaching the stylar papillary cells, pollen grain rehydration and activation occurs (Vogler et al. 2015). Later on, pollen tube growth through the female connective tissues starts processes, which are accompanied by reciprocal communication of pollen tube and pistil tissues (Hafidh et al. 2014, 2016b; Higashiyama 2015, see also Chap. 8). The pollen tube delivers two sperm cells (male haploid gametes) to the embryo sac. One sperm cell fuses with the egg cell (female haploid gamete) giving rise to a diploid zygote, which represents a start of a novel diploid sporophyte generation. The zygote subsequently undergoes several rounds of mitotic divisions giving rise to embryo and a new plant. The second sperm cell fuses with the central nucleus of the embryo sac, giving rise to the triploid endosperm. The double fertilisation is typical in angiosperms and was reviewed in more detail by Raghavan (2003).

In this chapter, we will discuss how various -omic techniques notably broadened the wealth of information about male gametophyte development. There were over hundred -omic studies performed on male gametophyte that were published so far (Table 10.1). Of these studies, transcriptomics represented the dominant experimental approach with 51% of all -omics experiments followed by proteomics (26%). The remaining 23% of experiments were shared by the identification and analyses of translatome, miRNAome, methylome, phosphoproteome, allergome, secretome and metabolome. Phylogenetically, most main orders (altogether 15) representing all major groups of seed plants are covered but to a different extent according to the distribution of model species. Therefore, the majority of information (75% experiments) was gathered on four well-distributed orders, Poales (Monocots, Commelinids, key models Oryza sativa and Zea mays), Brassicales (Rosids, key model species Arabidopsis thaliana), Solanales (Asterids, key models Nicotiana tabacum and Solanum lycopersicum) and Liliales (Monocots, key model Lilium longiflorum). Moreover, these model species were subjected to a combination of several -omic approaches, which enabled the mutual comparison of various -omic datasets of different origins.

Table 10.1 Summary of published male gametophyte -omics studies

2 Transcriptomics

Transcriptomics was the first -omics technique applied to the male gametophyte (Becker et al. 2003; Honys and Twell 2003; Lee and Lee 2003) and so far transcriptomic profiles of at least one male gametophyte developmental stage were published for 22 seed plant species, 21 of which were angiosperms (Table 10.2).

Table 10.2 Overview of published male gametophyte -omics studies. Analysed species are put in the appropriate orders

Transcriptomic studies underwent development from sequencing of cDNA or EST libraries through serial analysis of gene expression (SAGE) and microarray-based studies to the most recent deep sequencing technologies (RNAseq). Sanger sequencing of cDNA or EST libraries is relatively low-throughput method, expensive and generally not quantitative. To overcome these limitations, tag-based methods were developed, including SAGE, cap analysis of gene expression (CAGE) and massively parallel signature sequencing (MPSS). These approaches are high throughput and measure precise gene expression levels. However, most of them are based on Sanger sequencing, and a significant portion of the short tags cannot be uniquely mapped to the reference genome. Moreover, only a portion of the transcripts can be analysed, and gene isoforms are generally indistinguishable from each other. These disadvantages limited the application of traditional sequencing technology in annotating the structure of transcriptomes (Wang et al. 2009). DNA microarrays started to appear during the late 1990s; however the first article was published by Schena et al. (1995). Gene chip technique brought the quantum leap in gene expression studies and became standard because of its well-established sample preparation and data analysis protocols, rapid turnaround time, wealth of archived data and data-mining methodologies. However, the use of expression microarrays is limited, as they require fabrication, and alongside their rigidity, they also depend on prior knowledge of genes and gene sequences (Loraine et al. 2013). Recently, the development of novel DNA deep sequencing technologies such as RNAseq has started to be a dominant technique, mainly because it requires only a little a priori knowledge of the genome, and therefore it enables transcriptome studies in non-model plant species. It allows both mapping and quantifying transcriptomes.

The majority of male gametophytic transcriptomics studies are still based on microarray analyses representing 33 experiments in 11 species (Table 10.2). Affymetrix has been the most commonly used platform, but a significant share of experiments used alternative Agilent (Zea mays, Nicotiana tabacum) and Roche NimbleGen (Cryptomeria japonica, Arabidopsis thaliana, Vitis vinifera) platforms. However, the number of model systems investigated by RNAseq is boosting; it currently reached 23 experiments in 15 species. Since all transcriptomic studies employing RNAseq were published in last few years, they are responsible for the recent massive increase in plant species with analysed pollen transcriptomes, especially among models, genomes of which have not been sequenced and annotated yet. For example, Rutley and Twell (2015) mentioned in their review only 10 angiosperm species. There is a limited overlap of species with male gametophyte being analysed on both platforms—microarrays and RNAseq. There were only five key models—Arabidopsis thaliana, Oryza sativa, Zea mays, Solanum lycopersicum and Lilium longiflorum (Table 10.2). Of them, four provided sufficient wealth of information, because the microarray transcriptome profiling of L. longiflorum pollen was achieved on a custom cDNA microarray. However, such overlap was sufficient for the comparisons of both platforms. Not surprisingly, RNAseq enabled the identification of larger transcriptome fraction mainly due to the absence of probes for numerous genes on microarrays and as a result of continuous refined genome annotations leading to further reduction of reliable gene models. However, such increase of the number of identified genes was not dramatic. The number of genes expressed in Arabidopsis mature pollen was calculated 6044 (Rutley and Twell 2015) as an average value from the range of 3954–7235 genes published in original mature pollen microarray-based datasets (Borges et al. 2008; Honys and Twell 2004; Pina et al. 2005; Qin et al. 2009; Schmid et al. 2005; Wang et al. 2008). On the contrary, of the 5525 annotated protein-coding loci that have no corresponding probe set on the Affymetrix ATH1 microarray, 451 genes were identified as expressed in pollen by RNAseq with normalised expression values of 5 reads per million (RPM) or greater (Loraine et al. 2013). Similarly, RNAseq transcriptomes of Zea mays mature pollen comprised 13,418 (Davidson et al. 2011) or 14,591 genes (Chettoor et al. 2014) in comparison to 10,539 genes previously identified using Agilent 44K maize microarray (Ma et al. 2008). Higher sensitivity is not the only advantage of RNAseq, the sequencing of non-exonic transcripts allowed for the first time the broad view on the alternative splicing in Arabidopsis pollen including the discovery of novel pollen-specific splicing patterns (Loraine et al. 2013). For the same reason, the over-representation of transposable element-related transcripts was observed in maize pollen, although to a lesser extent than in embryo sac transcriptomes sequenced in parallel (Chettoor et al. 2014). Similar pattern was detected for transcripts encoding small signalling peptides of DEFENSIN/LURE (DEFL) family since probes for small peptide genes were often omitted from earlier microarray studies (Chettoor et al. 2014).

Not surprisingly, 49 out of 63 datasets (77%) related to mature pollen (Table 10.2). Only a fraction of experiments included also pollen developmental stages. Full pollen development including at least four developmental stages is available for three species—Arabidopsis thaliana, Oryza sativa and Nicotiana tabacum (Bokvaj et al. 2015; Honys and Twell 2004; Wei et al. 2010). Considering less developmental stages sufficient for the evaluation of transcriptome dynamics throughout pollen development would lead to the addition of four more species—Zea mays (meiocytes and pollen; Chettoor et al. 2014; Dukowic-Schulze et al. 2014; Xu et al. 2012), Lilium longiflorum (microspores and mature pollen; Okada et al. 2007), Brassica napus (microspores and mature pollen; Whittle et al. 2010) and Fragaria vesca (microspores and mature pollen; Hollender et al. 2014). The inclusion of progamic phase changed the list of analysed species. More than one time point of pollen germination and in vitro pollen tube growth was analysed in four species—Arabidopsis thaliana (germinating pollen and pollen tubes; Wang et al. 2008), Lilium longiflorum (hydrated pollen, germinating pollen and pollen tubes; Lang et al. 2015; Obermeyer et al. 2013), Nicotiana tabacum (4h and 24h pollen tubes; Hafidh et al. 2012a, b) and Pyrus bretschneideri (hydrated pollen and pollen tubes; Zhou et al. 2016). Only one time point of progamic phase (germinating pollen) was analysed in Oryza sativa (Wei et al. 2010). Finally, only in vitro pollen tube transcriptome without the reference mature pollen sample is available for Camellia sinensis (Wang et al. 2016). The quantification of transcriptome dynamics showed similar expression pattern throughout pollen development and progamic phase in all species analysed. In general, the complexity of male gametophyte transcriptome was lower than that of any sporophytic tissue analysed. It reached its maximum in early developmental stages and was subsequently reduced until mature pollen reaching only 61% in A. thaliana and N. tabacum and even 46% in O. sativa of the maximum value in the male gametophyte. (Honys and Twell 2004; Wei et al. 2010; Peng et al. 2012; Bokvaj et al. 2015; Rutley and Twell 2015). During progamic phase, the size of pollen tube transcriptomes remained relatively stable, similar to that of mature pollen or slightly larger increasing usually only by 0.1–3% in comparison to mature pollen in Pyrus bretschneideri, Arabidopsis thaliana, Oryza sativa and Nicotiana tabacum (Qin et al. 2009; Wei et al. 2010; Hafidh et al. 2012a, b; Zhou et al. 2016). Therefore, there was no apparent difference between fast growing tri-cellular pollen tubes and less advanced but metabolically more active bi-cellular pollen tubes. The only exception was another Arabidopsis study, in which the transcriptome complexity in 4h pollen tubes increased by 24% (Wang et al. 2008).

Pollen development is tightly regulated primarily at the level of transcription; it is under the control of at least two successive global gene expression programmes, early and late. The switch point between both developmental programmes occurs after pollen mitosis I in both tri-cellular (A. thaliana, Twell et al. 2006) and bi-cellular (N. tabacum, Honys et al., unpublished data) pollen. The initiation of the late programme therefore more likely reflects the progress of pollen maturation rather than the timing of pollen mitosis II (Hafidh et al. 2012a; Rutley and Twell 2015), supporting the uniqueness of the late male gametophytic transcriptome (Honys and Twell 2004) as shown also by principal component analyses in several species (Tang et al. 2010; Russell et al. 2012; Bokvaj et al. 2015; Rutley and Twell 2015). Generally, genes involved in cell cycle control and transcription regulation were expressed in both early and late male gametophyte transcriptomes, however, with variable expression of individual transcription factor (TF) genes and gene families, like MYB/MYB related, AP2-EREBP, C2H2, bHLH, MADS, bZIP, WRKY and TCP. On the contrary, the gene ontology (GO) category of protein synthesis/translation was over-represented in early developmental stages, whereas cell wall synthesis, cytoskeleton, signalling, protein turnover and localisation were upregulated closer to pollen maturation and in growing pollen tubes (Twell et al. 2006; Wei et al. 2010; Hafidh et al. 2012a; Costa et al. 2013, Zhou et al. 2016).

Comparative and developmental transcriptomic studies served as an information background for follow-up research including reverse genetic screens for male gametophytic transcription factors (Reňák et al. 2012), F-box proteins (Ikram et al. 2014), signalling proteins (Chen et al. 2014) and numerous functional studies. Unlike them, transcriptomic studies comparing wild-type and mutant pollen were rare, and besides the search for genetic interactions in pollen tubes deficient in two arabinogalactan protein-coding genes, agp6 and agp11 (Costa et al. 2013), they aimed to identify the transcriptional networks that regulate cell differentiation and define cell-specific functions during pollen development (Verelst et al. 2007b; Gibalová et al. 2009).

The comparison of wild-type and agp6/agp11 double-mutant pollen tubes revealed 1022 differentially expressed genes (14.7% of the pollen tube transcriptome), almost equally distributed among upregulated and downregulated sets. GO categorisation of these genes was similar as in other late pollen transcriptomes; however, the over-representation of several protein groups (F-box proteins, receptor-like protein kinases, protein chaperones and proteins involved in calcium signalling) highlighted the interactions of AGP6 and AGP11 with members of the pollen tube endosome machinery enabling the recycling of AGPs to perform their signalling role (Costa et al. 2013).

MADS-domain transcription factors play key roles in the development of higher eukaryotes functioning as higher-order complexes. Five members of the MIKC* subgroup of the MADS-box family (AGL30, AGL65, AGL66, AGL94 and AGL104) were strongly upregulated in late stages of Arabidopsis pollen development (Pina et al. 2005) and were shown to form several heterodimeric complexes preferentially binding MEF2-type CArG-box sequence motifs (consensus CTA(A/T)4TAG) also over-represented in promoters of late pollen-expressed genes (Verelst et al. 2007a). Transcription profiling of double, triple (Verelst et al. 2007b) and even quadruple (Adamczyk and Fernandez 2009) mutants deficient in several combinations of MIKC* genes revealed the intriguing complexity of MADS-box TF network directing cellular differentiation during pollen maturation, a process that is essential for male reproductive fitness in flowering plants (Verelst et al. 2007b; Adamczyk and Fernandez 2009). Interestingly, the importance of MIKC* MADS-box TFs ZmMADS2 for maize pollen development was demonstrated even earlier on (Schreiber et al. 2004). The functional conservation of MIKC* MADS-box complexes in Arabidopsis and rice indicated that the function of heterodimeric MIKC* protein complexes in pollen development has been conserved since the divergence of monocots and eudicots, roughly 150 million years ago (Liu et al. 2013).

Basic leucine zipper (bZIP) transcription factors act as homo- or heterodimers and these effector-type TFs control many aspects of plant development including reproduction. Pollen-expressed TF AtbZIP34 is active during late stages of male reproductive development with a complex sporophytic and gametophytic mode of action. Transcription profiling of atbzip34 mutant pollen led to the finding that AtbZIP34 regulon comprises membrane-associated transporters and proteins involved in lipid metabolism and cell wall synthesis (Gibalová et al. 2009).

Transcription profiling was used also to unravel the regulon of male germline-specific R2R3-MYB transcription factor, DUO POLLEN1 (DUO1, Durbarry et al. 2005; Rotman et al. 2005), playing an important role in sperm cell differentiation. However, Borg et al. (2011) adopted different strategies as they analysed the transcriptomes of seedlings with ectopically expressed DUO1 in an estradiol-inducible manner and identified 63 candidate targets. Moreover, DUO1 was shown to directly regulate its target promoters through binding to the canonical MYB sites (Borg et al. 2011). The role of two DUO1 target genes DAZ1 and DAZ2 has been characterised; they both encode EARFootnote 1 motif-containing C2H2-type zinc finger proteins that are important for both generative cell division and DUO1-dependent germ cell differentiation (Borg et al. 2014).

In connection with bi- and tri-cellular pollen types in angiosperms, DUO1 may have been involved in the control of the timing of generative-cell division during the evolution of the pollen type (Hafidh et al. 2012b; Rotman et al. 2005). In order to illustrate complicated processes and interrelationships in plant protein families, we adopted the DUO1 gene as an example for the demonstration of frequently used genomic tools for studying the roles of specific proteins in the cell and to briefly summarise current knowledge and line out the distribution of different orthologs within angiosperms. Up to 2015, the lack of knowledge has hindered the comprehension of the origin and evolutionary history of MYB gene family across plants. It was reported that the intron patterns of R2R3-MYB transcription factors were greatly conserved in model higher plants (Matus et al. 2008; Du et al. 2012). However, the prevalence of R2R3-MYBs was quite different indicating that their introns were established in the common ancestor of land plants. Moreover, intron patterns in algae were variable and different from land plants. These findings suggested that algae and land plant lineages used different splicing patterns. Du et al. (2015) confirmed that R2R3-MYBs were older than 3R-MYBs which may be evolutionarily derived from R2R3-MYBs via intragenic domain duplication. The interesting feature is that intron patterns of land plant R2R3-MYBs were exclusively conserved within each subfamily. Phylogenetic relationships of DUO1 and related MYB-family transcription factors expressed in pollen from selected plant species including basal angiosperms are shown in Fig. 10.2. Members of different orders tend to cluster together within a given clade indicating that clades could have been expanded after divergence from their common ancestor (Du et al. 2015). This finding suggests the common origin of family members. All dicots clustered in their own clades and are separated from monocots and basal angiosperms, which exhibit lineage-specific expansion. The resolution within main orders is species dependent and possibly suffers from missing data (i.e. representatives from many groups across the phylogenetic tree that are not available yet) and long-branch attraction artefacts. Alternatively, the species-specific R2R3-MYBs may represent genomic relics that evolved independently as Du et al. (2015) adumbrated. These results show that there may be more lineage-specific subfamilies in DUO1 gene, and their evolution history could be solved in a future by using a more representative set of species and by combination with the expression -omics studies.

Fig. 10.2
figure 2

Phylogenetic relationships of pollen-specific DUO POLLEN 1 gene. The evolutionary history was inferred by using the maximum likelihood method based on the JTT matrix-based model. The tree with the highest log likelihood (−2637.8067) is shown. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The tree was divided into five main phylogenetic subgroups with bootstrap support >80% shown in different colours

It has been established that pollen tube growth in vitro and in vivo differ and that the directional pollen tube growth is greatly influenced by pistil tissues (Palanivelu and Preuss 2006; Palanivelu and Tsukamoto 2012). Therefore, pollen tubes growing in vitro and in vivo were compared in Lilium longiflorum (Huang et al. 2011). Similar analysis including also mature pollen was performed in Olea europaea (Carmona et al. 2015; Iaria et al. 2016). To characterise the changes in gene expression of A. thaliana pollen tubes growing in vivo and pistil-activated pollen tubes, the semi-in vivo approach was applied (Qin et al. 2009). It was shown that the pistil activation induced the expression of 1254 genes (18% of the overall transcriptome), many of which were pollen specific. None of these genes were active in in vitro growing pollen tubes. Similar analysis was performed on A. thalianasemi-in vivo pollen tubes growing under the influence of isolated ovules that induced the expression of 719 genes (Chen et al. 2014). In both studies, genes with potential function in signalling, pollen tube growth/cell extension and transcription were over-represented among the pistil-activated genes, including three MYB transcription factors (MYB97, MYB101, MYB120, Qin et al. 2009) that were later shown to play a crucial role during pollen tube differentiation required for sperm release (Leydon et al. 2013; Liang et al. 2013). Eighteen candidate genes for pollen tube guidance including TIR-NBS-LRRFootnote 2 proteins, DEFL proteins, protein kinases and receptor-like protein kinases were selected for reverse genetic screen, however, only with limited success, due to the functional redundancy in these large gene families (Chen et al. 2014).

Both cell types forming mature pollen–‘somatic’ vegetative cell and generative cell/sperm cells as the male germline–have very different cell fates that are reflected in their gene expression. Therefore, several experiments compared the transcriptomes of isolated male gametes with either whole mature pollen or even with isolated vegetative nuclei. In plants dispersing tri-cellular pollen, mature pollen grains were used. The first, although limited sperm cell transcriptome obtained by EST library sequencing, was published for Zea mays (Engel et al. 2003). In Plumbago zeylanica, producing dimorphic sperm cells, their individual transcriptomes were compared to that of vegetative nuclei (Gou et al. 2009), whereas in Arabidopsis thaliana (Borges et al. 2008) and Oryza sativa (Russell et al. 2012; Anderson et al. 2013), sperm cell transcriptomes were compared to intact mature pollen grains. Similar analysis was performed in Lilium longiflorum that releases bi-cellular pollen and therefore the transcriptome of generative cell was studied (Okada et al. 2006, 2007). To isolate sperm cells of plants producing bi-cellular pollen, Nicotiana tabacum pollen was germinated semi-in vivo, sperm cells were collected from pollen tubes emerging from cut pistils and their transcriptome was obtained by conventional EST sequencing (Xin et al. 2011). Seven studies of sperm cells from six plant species covered both dicots and monocots as well as plants producing bi- and tri-cellular pollen, providing us with an ample material for comparisons. The identified fractions of individual sperm cell transcriptomes varied according to the method used, from around 600 (Plumbago zeylanica, EST sequencing, Gou et al. 2009) to tens of thousands genes (Oryza sativa, RNAseq, Anderson et al. 2013). Such numbers clearly prove that sperm cells were definitely not transcriptionally and metabolically inactive entities. In all species, sperm cell transcriptomes were different not only from sporophytic control datasets but also from the transcriptomes of whole pollen/vegetative nuclei (i.e. Russell et al. 2012) as visualised also by PCA analyses (Borges et al. 2008; Russell et al. 2012). Moreover, there were striking differences between the transcriptomes of both types of dimorphic Plumbago zeylanica sperm cells, destined to fuse with egg cell and central cell, respectively (Gou et al. 2009). Interesting similarities were found in the presumed function of sperm cell-expressed proteins including the significant fraction of proteins of unknown function. Sperm cell transcriptomes were enriched in GO categories of cell cycle proteins, membrane-associated proteins, proteins involved in signal transduction, protein destination, ubiquitin-mediated proteolysis and epigenetic modifications (Engel et al. 2003; Okada et al. 2006, 2007; Borges et al. 2008; Xin et al. 2011; Russell et al. 2012; Anderson et al. 2013). On the contrary, genes for RNAi machinery were downregulated (Russell et al. 2012; Anderson et al. 2013). The fact that sperm cell transcriptomes of A. thaliana, N. tabacum and Z. mays shared only 0.3% genes (7.6% of genes were shared by at least two of three datasets, Xin et al. 2011) pointed out either the possibility that still only limited transcriptome fractions were identified and/or that there were more significant differences between sperm cells of dicots and monocots as well as those formed in bi- and tri-cellular pollen than originally expected. In any case, this fact is stimulating for further exciting research.

Finally, transcriptome profiles of gametophyte generation are available also for a few species of bryophytes and ferns—Physcomitrella patens (O’Donoghue et al. 2013; Ortiz-Ramirez et al. 2016; Xiao et al. 2011), Tortula ruralis (Oliver et al. 2004), Marchantia polymorpha (Higo et al. 2016) and Pteridium aquilinum (Der et al. 2011). However, since we concentrate solely on seed plants, gymnosperms and angiosperms, these datasets are not present in Table 10.2.

3 Proteomics

Transcriptomic analyses have provided and continue providing valuable information about global and specific gene expression and its dynamics. Transcriptomic data, however, do not provide complete information about gene expression since the proteome does not fully reflect the transcriptome (de Groot et al. 2007). This is especially true for systems with high level of translational regulation, such as the male gametophyte. The principal reason is that it is not technically feasible for transcriptomics to take into account the possible contribution of post-transcriptional regulatory levels of gene expression (Keene 2007). The effect of splicing (Collins 2011; Lorkovic and Barta 2004; Reddy et al. 2012) including the identification of potentially alternatively spliced transcripts (Kazan 2003; Sanchez et al. 2011; Xing and Li 2011) can be studied with the complete tilling gene chip (Whole-Genome ChIP Tilling Array ATH6, Roche-NimbleGen Systems, Inc.) or, ideally, by RNAseq (Loraine et al. 2013). However, further post-transcriptional regulatory levels, especially translation and mRNA storage, are active during male gametophyte development (Hafidh et al. 2011, 2016a). For all above reasons, it remains necessary to complement transcriptomics with proteomic data to get more realistic insight.

Soon after the first pollen transcriptomic analyses appeared in the last decade, the initial studies of pollen proteome were published. Surprisingly, the first plant species with published pollen proteome, though very incomplete, was a gymnosperm, Pinus strobus (Fernando 2005). Since then, numerous angiosperm species followed, for example, Arabidopsis thaliana (Grobei et al. 2009; Holmes-Davis et al. 2005; Noir et al. 2005; Sheoran et al. 2006; Zou et al. 2009), Oryza sativa (Dai et al. 2006, 2007), Solanum lycopersicum (Lopez-Casado et al. 2012; Sheoran et al. 2007), Lilium longiflorum (Pertl et al. 2009), Brassica napus (Sheoran et al. 2009a), Quercus ilex (Valero Galvan et al. 2012), and Helianthus annuus (Ghosh et al. 2015).

The pioneering proteomic studies were based on the excision of intact proteins in the isolated spots acquired from gels after 2-D gel electrophoresis (2-DE), which were protease-treated and analysed by mass spectrometry. Therefore, it resulted in only a very limited fraction of the total proteome, comprising usually several hundred proteins, almost invariably fewer than 1000. Therefore, 2-D gel-based proteomics has much lower coverage than transcriptomics and a little overlap between individual experiments. For example, three first published Arabidopsis pollen proteomic datasets identified 135 (Holmes-Davis et al. 2005), 121 (Noir et al. 2005) and 96 (Sheoran et al. 2006) proteins, respectively (Fig. 10.3). Considering overlaps between these datasets, 2-DE proteomics enabled the identification of 267 mature pollen proteins. Affymetrix ATH1 gene chip harboured probes for 237 of these proteins representing only very limited fraction of 13,977 genes active during pollen development (Honys and Twell 2004), of which 6044 were identified in mature pollen (Rutley and Twell 2015). Of identified 267 mature pollen proteins, 200 (75%) were found only in a single study, whilst only 18 proteins (7%) were identified by all three groups. This very small coverage of the pollen proteome is not surprising. One can assume that proteins found by more authors were encoded by most strongly expressed genes. Indeed, all 18 genes encoding these proteins were among the most abundant in pollen transcriptome. They also belonged to the functional categories containing usually only a limited number of typically very highly expressed genes: energy metabolism (8), stress response (4), synthesis and metabolism of the cell walls (2), cytoskeleton (2), protein synthesis (1), and metabolism (1). The result was also influenced by protein extraction protocol used by different groups that affects the composition of purified proteome fraction drastically, as independently demonstrated in tobacco pollen (Fíla et al. 2011).

Fig. 10.3
figure 3

Quantification of Arabidopsis thaliana transcriptomic and proteomic studies

Characterisation of although a limited part of the pollen proteome enabled the functional categorisation and comparison of pollen transcriptome and proteome. Again, there is significantly higher proportion of functional categories grouping abundant proteins into proteomic datasets. On the contrary, the obvious variability between individual 2-DE-analysed pollen proteomes is not as high as might be expected from their variability. As an exception, stress-related proteins and proteins of unknown function differ significantly. However, especially here, the possible influence of the downstream processing of the biological material cannot be excluded. In comparison to transcriptome, there is higher proportion of proteins involved in massive processes of general and energy metabolism and of protein synthesis and metabolism. Similar trends can be seen in the reference datasets characterising Oryza sativa pollen proteome (Dai et al. 2006, 2007), another model species with tri-cellular pollen. In this respect, the original Pinus strobus pollen proteome (Fernando 2005) is significantly different, not only because pine is a gymnosperm, in which male gametophyte structure and development differ from the angiosperms, but at least partly due to the significantly smaller size of identified proteome fraction and the lack of known genomic sequence of any gymnosperm at the time of publication. Therefore, there was particularly high fraction of unknown proteins in Pinus strobus proteome.

A fundamental breakthrough not only in pollen research was the introduction of gel-free proteomic techniques, which increased the efficiency of peptide and protein identification by one order. Gel-free techniques also enabled more accurate quantification of protein abundance. However, only a limited number of plant species were used for gel-free pollen proteome characterisation (Table 10.2). The first shotgun proteomic study identified 3465 proteins in Arabidopsis pollen (Grobei et al. 2009) that represented almost 13-times enlargement of a known fraction of the mature pollen proteome including the vast majority of proteins previously identified by 2-DE techniques. Such extension was also reflected in the functional categories, to which the identified proteins belonged, which were closer to the transcriptomic studies including the significant representation of the stable structural proteins (Fig. 10.3). Comparison of proteomic and transcriptomic datasets showed that the vast majority of 2928 (85% of described proteome) genes identified in pollen by proteomic and transcriptomic approaches was encoded by transcripts present already in early stages of male gametophyte development (Grobei et al. 2009; Honys and Twell 2004), thus representing an independent confirmation of the extent of translational regulation of gene expression in pollen (Honys et al. 2000, 2009; Honys and Twell 2004). Finally, 537 pollen proteins (15% of pollen proteome) had not been described in any transcriptomic study known at that time which made proteomics an attractive method for gene expression studies complementary to transcriptomics.

Similarly to transcriptomics, proteomic studies also aimed at the characterisation of the proteome dynamics during the pollen development and pollen tube growth. The list of species is shorter containing only Solanum lycopersicum (Chaturvedi et al. 2013), Nicotiana tabacum (Ischebeck et al. 2014) representing angiosperms and Picea wilsonii (Chen et al. 2012) as a representative of gymnosperms. Chen et al. (2012) investigated the influence of the limited nutrient supply on pollen tube growth, namely, the deficiency of sucrose, calcium and boron. In total, 166 proteins and 42 phosphoproteins (see also next section) were identified by LC-MS/MS as differentially regulated. Such number, low for gel-free approach, was mainly caused by the lack of conifer genome sequence data at that time. The identified proteins were involved in a variety of signalling pathways, providing new insights into the multifaceted mechanism of nutrient function including indicated nutrient-specific effects (Chen et al. 2012). Remaining two studies provided the most comprehensive male gametophytic proteomic datasets at the moment; they covered numerous stages of pollen development of related Solanaceae species providing evidence for developmentally controlled processes that might help to prepare the cells for specific developmental programmes and environmental stresses. In Solanum lycopersicum, five pollen developmental stages were compared—microsporocytes, tetrads, microspores, polarised microspores and mature pollen (Chaturvedi et al. 2013). In Nicotiana tabacum, the covered period was even broader—diploid microsporocytes, meiosis, tetrads, microspores, polarised microspores, bi-cellular pollen, mature pollen and pollen tubes, altogether eight stages (Ischebeck et al. 2014). In tomato, 1821 proteins were identified (Chaturvedi et al. 2013), whereas the tobacco analysis led to the identification of 3817 protein groups (Ischebeck et al. 2014). In both species, principal component analyses (Chaturvedi et al. 2013; Ischebeck et al. 2014) provided a similar picture as those resulted from transcriptomic studies (Bokvaj et al. 2015; Rutley and Twell 2015) and demonstrated that pollen development is highly controlled sequential process also at the proteome level. From the predicted functions, energy-related proteins are upregulated during the later stages of tomato pollen development. It indicates that pollen germination depends upon presynthesised proteins in mature pollen. In contrast, heat stress-related proteins are highly abundant in very early developmental stages, suggesting a dominant role in stress protection (Chaturvedi et al. 2013). Similar observations were made in tobacco, where the early developmental stages were enriched also with ribosomal and other translation-related proteins (Ischebeck et al. 2014).

In several cases, the individual studies targeted specific cell types, namely the generative or sperm cells in Lilium davidii (Zhao et al. 2013), and Oryza sativa (Abiko et al. 2013). In rice, sperm cell gel-free proteome was compared to the whole mature pollen grain. Of 2179 sperm cell-expressed proteins, 77 were preferentially present in the male gametes (Abiko et al. 2013). The comprehensive study by Zhao et al. (2013) employed 2-D DIGE followed by MALDI-TOF/TOF mass spectrometry to identify 101 proteins differentially expressed in lily generative and sperm cells. These proteins are involved in diverse cellular and metabolic processes, with preferential involvement in the metabolism, cell cycle, signalling, the ubiquitin/proteasome pathway, and chromatin remodelling, i.e. similar categories as revealed by transcriptomics. Impressively, almost all proteins in ubiquitin-mediated proteolysis and the cell cycle were upregulated in sperm cells, whereas those in chromatin remodelling and stress response were downregulated (Zhao et al. 2013).

Other studies were devoted to specialised cellular compartments, including membranes (Lilium longiflorum, Pertl et al. (2009); Lilium davidii, Han et al. (2010); and Solanum lycopersicum; Paul et al. (2016)), nuclei (Yang et al. 2016) and messenger ribonucleoprotein (mRNP) complexes (Honys et al. 2009). Other studies focused on specific protein groups (e.g. allergens) or proteins characterised by specific post-translational modifications, notably phosphorylation (Fíla et al. 2012, 2016; Mayank et al. 2012, see below). In all species analysed, membrane proteomes confirmed the presence of expected membrane-associated proteins on/in plasma membrane as well as endomembranes (Pertl et al. 2009; Han et al. 2010; Paul et al. 2016). In Lilium longiflorum, the differences in abundance of various protein types were observed in both membrane fractions in mature pollen and in several time points of pollen tube growth. For example, increase in the abundance of proteins involved in cytoskeleton, carbohydrate, energy metabolism, as well as ion transport was observed before pollen germination (10–30 min), whereas proteins involved in membrane/protein trafficking, signal transduction, stress response and protein biosynthesis decreased in abundance during this time (Pertl et al. 2009). Similar proteins were identified in membrane proteomes of two tomato cultivars, and the presence of proteins corresponding to energy-related pathways (glycolysis and Krebs cycle) enabled to present a hypothetical model of energy reservoir of the male gametophyte (Paul et al. 2016). Lilium davidii pollen and pollen tubes plasma membrane proteome fraction comprised also proteins of translational apparatus and DNA/RNA-binding proteins with preferential occurrence of ribosomal proteins. The identification of these proteins probably resulted from the presence of cytoskeleton-binding polysomes anchored to the plasma membrane via actin filaments or targeted to lipid rafts (Han et al. 2010). The association of translation apparatus and RNA-storage particles with the actin cytoskeleton was observed also in tobacco pollen and pollen tubes (Honys et al. 2009) where the protein composition of large ribonucleoprotein particles (EPPs) was studied. EPP complexes are formed in immature pollen where they contain translationally silent mRNAs. Although massively activated at the early progamic phase, they also serve as a long-term storage of mRNA transported along with the translational machinery to the tip region. Since EPPs contain ribosomal subunits, rRNAs and a set of mRNAs, they were hypothesised to represent well-organised machinery devoted to mRNA storage, transport and subsequent controlled activation resulting in protein synthesis, processing and localisation, extremely useful in fast tip-growing pollen tube. Expression of vast majority of the closest orthologues of EPP proteins also in Arabidopsis male gametophyte further extended this concept from tobacco to Arabidopsis, the model species with advanced tri-cellular pollen (Honys et al. 2009). The last cellular compartments analysed for the proteomic perspective were the vegetative, generative and sperm cell nuclei of Lilium davidii (Yang et al. 2016). The profiling of histone variants of all five histone families in all three cell types revealed 92 identities representing 32 histone variants. Generative and sperm cells had almost identical histone profiles and similar histone H3 modification patterns, significantly different from those of vegetative nuclei. These results suggested that differential histone programmes, important for the identity establishment and differentiation of the male germline, may be established following the asymmetric division (Yang et al. 2016).

To summarise, it is obvious that proteomics studied a broader spectrum of species (including plants not representing the classical models) compared to microarray transcriptomics since proteomics is not limited by sequenced genomic DNA of the particular species. The EST sequences or protein sequences from related species can be used instead. It also highlighted post-transcriptional levels of gene expression that could not be addressed by transcriptomics.

4 Phosphoproteomics

Pollen rehydration and activation is accompanied by two main regulatory mechanisms of gene expression. The first one is represented by translation regulation. A notable part of stored, translationally regulated transcripts are localised in EDTA/puromycin-resistant particles (EPPs) in tobacco (Nicotiana tabacum, Honys et al. 2000, 2009). The transcripts inside these complexes are stored in a translationally silent form in mature pollen, whereas upon pollen rehydration, the mRNAs are being de-repressed and translated. Since most growing processes inside the pollen tube are localised to the tip, EPP complexes are transported towards the pollen tube tip (Honys et al. 2009). The second mechanism of gene expression regulation is phosphorylation, which is one of the most dynamic post-translational modifications of proteins. Protein phosphorylation in reaction to rehydration was revealed not only in Nicotiana tabacum male gametophyte (Fíla et al. 2016) but also in rehydrated plants of the xerophyte Craterostigma plantagineum (Röhrig et al. 2008) and in the cells in Zea mays leaf growing zone (Bonhomme et al. 2012). Large-scale studies of protein phosphorylation usually employ various enrichment protocols to enable the identification of phosphorylated peptides in the total complex protein crude extract. The enrichment can be either carried out at the level of intact proteins or alternatively from the peptide mixture acquired after cleavage of the complex total protein crude extract by a specific protease (Fíla and Honys 2012). Both these phosphoproteomic approaches showed their advantages as well as limitations.

The first phosphoproteomic study performed on male gametophyte was that of Arabidopsis thaliana mature pollen (Mayank et al. 2012, Table 10.3). This study applied a combination of three phosphopeptide-enriching protocols: immobilised metal affinity chromatography (IMAC), metal oxide/hydroxide affinity chromatography (MOAC) and sequential elution from IMAC (SIMAC). The study presented collectively 962 phosphopeptides carrying 609 phosphorylation sites that belonged to 598 phosphoproteins. From the functional point of view, most identified phosphoproteins took part in the regulation of protein metabolism and function, metabolism, protein fate, protein binding, signal transduction and cellular transport. Several kinases were also among the identified phosphoproteins, particularly AGCFootnote 3 protein kinases, Ca2+-dependent protein kinases and sucrose non-fermenting protein kinases 1.

Table 10.3 Comparison of published angiosperm male gametophyte phosphoproteome studies in particular from Arabidopsis thaliana, tobacco and maize

The next male gametophyte of angiosperms, which was subjected to phosphoproteomic techniques, was tobacco (Nicotiana tabacum; Fíla et al. 2012, 2016, Table 10.3). The former study identified 139 phosphoprotein candidates from mature pollen and pollen grains activated in vitro for 30 min. In order to improve the number of unambiguously positioned phosphorylation sites, titanium dioxide phosphopeptide enrichment was performed on trypsin-digested mature pollen crude extract, which led to the identification of 51 more phosphorylation sites localised in the phosphoproteins already identified in mature pollen giving a total of 52 unambiguous phosphorylation sites. In order to understand the processes during pollen grain activation and the start of pollen tube growth, pollen grains activated in vitro for 5 min were also taken into consideration in the subsequent study (Fíla et al. 2016). To increase the probability of phosphoprotein identification, phosphopeptide-enriching MOAC with titanium dioxide matrix was applied exclusively. In the mentioned three stages of tobacco male gametophyte, 471 phosphopeptides were identified, which carried 432 unambiguously determined phosphorylation sites. These phosphorylated peptides were assigned to 301 phosphoproteins. The phosphopeptide enrichment of the three stages increased notably the number of identified phosphorylation sites. The dominant functions were transcription, protein synthesis, protein destination and storage and signal transduction. It is also worth mentioning that almost one fifth of identified phosphopeptides was put into categories with unknown function or unclear classification. These results are in agreement with tobacco male gametophyte proteome, where approx. fifteen percent of male gametophyte-specific proteins were of unknown classification. The unknown phosphoproteins represent likely male gametophyte-specific or male gametophyte-enriched proteins, function of which will be probably important for regulation of pollen activation and pollen tube growth. A notable part of the identified phosphopeptides showed a significant regulatory trend in the progamic phase of male gametophyte. Most of the regulated peptides were shown to be exclusive for mature pollen grains. The only alternative study that considered other stages than mature pollen was performed with pollen tubes from Picea wilsonii (Chen et al. 2012). However, it differed in two ways from the above studies: (1) a gymnosperm was studied instead of an angiosperm, and (2) the proteome and phosphoproteome of Picea wilsonii pollen tubes were studied not from the developmental point of view but as a reaction to growth media lacking sucrose or Ca2+ ions (which serve as important nutrients for pollen tube growth). The Picea wilsonii study thus revealed 166 proteins and 42 phosphoproteins playing their roles in signalling of media deficiency.

Most recently, Zea mays became the first monocot with a published mature pollen phosphoproteome (Chao et al. 2016), but again, no other gametophyte stages were studied (Table 10.3). Despite this, the maize pollen phosphoproteomic dataset became the largest one since it presented 4638 phosphopeptides, which belonged to 2257 phosphoproteins. These phosphorylated peptides led to the identification of 5291 phosphorylation sites with many multiply phosphorylated phosphopeptides and also carrying more than one phosphorylation site. The dominant molecular functions were ion binding, kinase activity, transmembrane transporter activity, oxidoreductase activity, and DNA binding, whilst the enriched biological processes were represented by protein posttranslational modification, cell organisation, signalling G-proteins, calcium signalling, abiotic stress, protein targeting, and RNA–RNA binding.

The functional categories of the identified phosphopeptides were quite similar in male gametophytes of all studied species. Since pollen tube tip growth requires several cellular mechanisms, such as small GTPase signalling, ion gradient formation, cytoskeleton organisation and transport of secretory vesicles (Palanivelu and Preuss 2000; Šamaj et al. 2006), dominant phosphoproteins common to all datasets belonged to at least some of these categories. Moreover, in tobacco, the protein synthesis category included proteins likely responsible for translation regulation, including EPP particles (Honys et al. 2009). The highest number of proteins responsible for the actual regulatory processes was identified in the maize phosphoproteome, including male sterility-associated proteins together with proteins influencing maize productivity (Chao et al. 2016). The number of proteins identified in tobacco could be influenced by the fact that the identifications of the second tobacco male gametophyte phosphoproteome relied on expressed sequence tags (EST sequences) that were acquired mainly from sporophyte tissues and thus could lack gametophyte-specific proteins.

The phosphoproteomic datasets are usually analysed whether they contain any over-represented sequence context surrounding the phosphorylation site. However, such a motif over-representation compared to the background dataset remains speculative, and the rare sequence motifs could remain undetectable. Moreover, the link between a particular kinase and target protein still remains to be experimentally proven. In Arabidopsis thaliana pollen phosphoproteome (Mayank et al. 2012), only serine-phosphorylated peptides were subjected to motif search, and only two motifs were identified: xxxxxxS*Pxxxxx and xxxRxxS*xxxxxx (phosphorylated amino acid is indicated by an asterisk behind the one-letter code). The prolyl-directed phosphorylation is usually mediated by mitogen-activated protein kinases and/or cyclin-dependent protein kinases, whereas the latter basic motif is recognised by Ca2+-dependent protein kinases (Lee et al. 2011). Later, the motif search in the second tobacco male gametophyte phosphoproteome (Fíla et al. 2016) enabled the identification of five motifs with central phosphoserine (particularly xxxxxxS*Pxxxxx, xxxRxxS*xxxxxx, xxxKxxS*xxxxxx, xxxxxxS*DxExxx, and xxxxxxS*xDDxxx) but also one with a phosphothreonine in the middle (xxxxxxT*Pxxxxx). The prolyl-directed phosphorylation (regardless whether a serine or a threonine occupied the middle position of the motif) was mediated by mitogen-activated protein kinases and/or cyclin-dependent protein kinases (Lee et al. 2011). Then, there were two alkaline and two acidic motifs. The former motifs were represented by an arginine or a lysine on the third position before the actual phosphorylated serine, which were recognised by Ca2+-dependent protein kinases and Ca2+-dependent protein kinases–sucrose-non-fermenting protein kinases (CDPK–SnRK) (Lee et al. 2011). The latter, acidic motifs can be in principle merged to one phosphorylation motif, xxxxxxS*(D/E)(D/E)(D/E)xxx, which was reported to be targeted by casein kinase 2 (CK2) (Lee et al. 2011). The broadest spectrum of kinase motifs was identified in maize mature pollen phosphoproteome (Chao et al. 2016), but several motifs were in principle shared with tobacco and Arabidopsis. However, these shared motifs were more specified in maize by other amino acid positions around the phosphorylation site so one phosphorylation site was actually split into more similar motifs differing in the specified position(s). In spite of more identified phosphorylation motifs in the most recent study, it still had common phosphorylation motifs with tobacco and Arabidopsis pollen phosphoproteomes. It is likely that pollen activation will bear similarities across various angiosperm species. Chao and colleagues thus identified 23 phosphoserine motifs and 4 phosphothreonine motifs, which were further sorted into 4 groups: 8 phosphorylation motifs were considered as prolyl-directed phosphorylation, 5 motifs were alkaline and 4 were acidic. The remaining ten phosphorylation motifs were collected in the group ‘others’. Several phosphorylation motifs (mainly from the group ‘others’) were considered as novel, and these motifs can represent the male gametophyte-specific regulatory pathways. Finally, it should be noted that all mentioned phosphorylation motifs were acquired by in silico data search, and it will be required to perform additional experiments in order to link a particular protein kinase with its target protein(s).

5 Specialised Pollen -Omics

To make the list of pollen -omics complete, we cannot leave out more specialised studies covering generally only limited number of model species. Of them, translatomics, methylomics and miRNAomics are based on transcriptomic approaches since they characterised specialised RNA populations functionally related to mRNA fate in the cytoplasm and translation. Likewise, pollen allergome and secretome were identified by proteomic techniques, whereas metabolomics employs a different set of techniques which will be discussed elsewhere in this book (see Chap. 12).

The study of Lin et al. (2014) represents the first and so far the only attempt to identify and characterise the subset of actively translated transcripts in in vivo-growing pollen tubes of Arabidopsis thaliana. The authors adopted elegant solution for the isolation of polysome-RNA complexes from pollen tubes growing through tiny Arabidopsis pistils; they were affinity purified via HIS6-FLAG dual-epitope tagged ribosomal protein RPL18 expressed under pollen vegetative cell-specific promoter LAT52. The comparison with in vitro-cultivated pollen tubes revealed over 500 transcripts specifically enriched in in vivo-elongating pollen tubes including transcripts encoding proteins involved in micropylar guidance, pollen tube burst and repulsion of multiple pollen tubes in embryo sac (Lin et al. 2014). Although the similar functional categorisation of genes upregulated in in vivo translatome (Lin et al. 2014), semi-in vivo transcriptome (Qin et al. 2009) and pollen–pistil interaction-induced transcriptome (Boavida et al. 2011), there was only very little overlap at the level of individual genes. However, it is difficult to conclude, whether such differences reflected a different nature of de novo transcription during in vivo and semi-in vivo pollen tube growth or only a subset of induced transcripts is being actively translated.

MicroRNAs (miRNAs) represent only a small portion of transcriptome, but they play an important role in post-transcriptional regulation of gene expression, mRNA cleavage, mRNA destabilisation through poly(A) tail shortening and translation inhibition (Brodersen et al. 2008; Carthew and Sontheimer 2009). Therefore, the identification of pollen miRNAs and their targets and especially the dynamics of miRNAome is of key importance for understanding the fine modulation of gene expression in the male gametophyte and in the process of the male germline differentiation. Of the seven studies published so far, one was devoted to gymnosperm Pinus taeda (Quinn et al. 2014) and the remaining six to three angiosperm model plants Arabidopsis thaliana (Chambers and Shuai 2009; Grant-Downton et al. 2009b; Borges et al. 2011), Oryza sativa (Peng et al. 2012; Wei et al. 2011), and Zea mays (Li et al. 2013). In these studies, 24-nt miRNAs represented the most abundant class. Alongside the identification of known and novel miRNA families, these studies also identified their putative mRNA targets, and in few cases, they even demonstrated the regulatory function of the respective miRNAs.

Chambers and Shuai (2009) profiled the expression of 70 known miRNAs in Arabidopsis mature pollen using miRCURY microarray and the comparison of their expression with transcriptomic profiles of their putative targets indicated the activity of several candidate miRNAs in pollen. The first large-scale study employing de novo sequencing was, as usual, conducted on Arabidopsis mature pollen and pollen miRNAome was compared to that of leaves (Grant-Downton et al. 2009b). Out of 33 miRNA families identified in pollen, expression of 17 was validated by RT-PCR, and most of them were found to be enriched in the male gametophyte with three (miR157, ath-MIR2939, and miR845) being putatively pollen specific. Moreover, the study reported, for the first time, the presence of trans-acting siRNAs in pollen (Grant-Downton et al. 2009b). Borges et al. (2011) analysed miRNA populations sequenced by Slotkin et al. (2009) with a special interest in miRNAs active in the male germline. They confirmed most of the previously identified pollen-expressed miRNAs (with the exception of miR776, Grant-Downton et al. 2009b) and found even higher representation of miRNA families both in mature pollen (75 families) and in sperm cells (83) including 25 potentially novel miRNAs processed in sperm cells and pollen. Of them, miR159 was particularly interesting, since it was highly enriched in sperm cells and was predicted to be involved in the regulation of DUO1 (Palatnik et al. 2007, Grant-Downton et al. 2009a). miR156 and miR158 represented other candidates for the role in the male germline as they were enriched in sperm cells and likely to associate with sperm cell-enriched ARGONAUTE 5 (see Borges et al. 2011). Rice studies (Peng et al. 2012; Wei et al. 2011) described the miRNAome dynamics during male gametophyte development in stages previously used for the transcriptome analysis (Wei et al. 2010). The authors identified numerous known and novel miRNAs, often pollen-enriched, and showed the correlation of their expression profiles with their potential targets (Wei et al. 2011). In maize, the comparison of miRNA populations isolated from mature pollen, in vitro-cultivated-pollen tubes and non-pollinated as well as pollinated silks identified 56 miRNAs (40 conserved and 16 novel) differentially expressed between pollen and pollen tubes and 38 miRNAs (30 conserved and 8 novel) showing differential expression pattern between mature non-pollinated and pollinated silks. The analyses of these miRNAs and their potential targets (predominantly auxin signal transduction and transcription regulation) also showed the participation of miRNA pathway in the regulation of pollen–pistil interactions (Li et al. 2013). In loblolly pine, miRNA populations were compared between mature and germinating pollen and 47 miRNAs (23% of 208 identified in total) representing 22 families were upregulated and downregulated (14 and 8 families, respectively) in germinated pollen. Together with other similarities and differences with Arabidopsis and rice pollen miRNA populations, it highlighted that the microRNA pathway is active also during pollen germination in gymnosperms (Quinn et al. 2014).

The methylome sequencing of haploid cell types during male gametogenesis highlighted the differential methylation patterns in vegetative and sperm cells (Calarco et al. 2012, Ibarra et al. 2012). Plant male germline retains symmetric DNA methylation, whereas the asymmetric methylation is lost there. On the contrary, asymmetric DNA methylation is restored in vegetative cells and during post-fertilisation embryo development. This de novo CHH methylation is a result of the activity of DOMAINS REARRANGED METHYLTRANSFERASE2 (DRM2) and employs 24-nt siRNAs. Differential genome reprogramming in pollen contributes to epigenetic inheritance, imprinting and transposon silencing (Calarco et al. 2012).

Of the two specialised proteomics-based techniques, allergomics was applied to a wider selection of plant species covering five orders of monocots and dicots (Table 10.2). For the obvious relation to pollen allergenicity, the selection lacks the usual models but concentrates mainly on wind-pollinated plants producing large amounts of pollen, predominantly grasses (Abou Chakra et al. 2012; Campbell et al. 2015; Kao et al. 2005; Schmidt et al. 2010; Schulten et al. 2013) and ragweed (Ambrosia spp.; Bordas-Le Floch et al. 2015; Zhao et al. 2016). Therefore, all studies were performed on mature pollen and were gel based. The purified proteins were separated by 2-DE, human IgE-binding proteins were identified, excised and analysed by mass spectrometry. Most of the major pollen allergen families were found in both dicot and monocot pollen—profilin, expansin, berberine bridge enzyme, pectate lyase, Ole e 1, cytochrome C and group 5/6 grass allergen (ribonuclease) families followed by enolase, EF hand, polygalacturonase, pathogenesis-related and prolamin families (Abou Chakra et al. 2012; Bordas-Le Floch et al. 2015; Campbell et al. 2015; Kao et al. 2005; Schmidt et al. 2010; Schulten et al. 2013; Zhao et al. 2016). In Phleum pratense, the major grass allergens were confirmed also in pollen cytoplasmic granules released from pollen grains that may represent respirable vectors of allergens (Abou Chakra et al. 2012).

Sexual reproduction in plants requires extensive cell–cell communication at many stages and at many levels including pollen–pistil interaction involving female sporophytic and gametophytic tissues that ends by the direct communication between male and female gametes preceding fertilisation. The extreme compactness of flower tissues as well as the need to separate individual cells made research in this area immensely difficult, and so far, most of the information came from the female side (reviewed by Kessler and Grossniklaus 2011, see also Chap. 8). The attempts to characterise the male–female crosstalk at the global scale first represented transcriptomic and proteomic studies of complex female tissues typically before and after pollination. Most of these studies also studied the phenomenon of pollen (in)compatibility. However, before that, it was interesting to analyse the pollination interface. Sang et al. (2012) analysed the differences in the proteomes of wet and dry stigmas in Nicotiana tabacum (wet) and Zea mays (dry) and compared them with the exudates from wet tobacco stigmas. With 177 identified proteins, tobacco stigmatic exudates were richer in proteins than stigmatic exudates of Lilium longiflorum and Olea europaea comprising 51 and 57 proteins, respectively (Rejon et al. 2013). Similarly, Nazemof et al. (2014) identified proteins involved in Triticale stigma development. However, these studies had only limited coverage because of the use of gel-based proteomics. Most recently, protein composition of ovular secretes, pollination drops, on female cones of two closely related gymnosperm species—Cephalotaxus koreana and C. sinensis—was similar to that of other gymnosperms including gnetophytes and contained mainly defence-related proteins and carbohydrate-modifying enzymes (Pirone-Davies et al. 2016). Therefore, a deeper insight was achieved through transcriptomics when transcriptomes of stigmatic papillary cells in three Brassicaceae species, Arabidopsis thaliana, A. halleri and Brassica rapa, showed great degree of similarity. Fifty-eight percent of papilla-expressed genes were shared by all three species (Osaka et al. 2013), and only a minor fraction of expressed genes was species specific. Interestingly, gene expression in Arabidopsis papillar cells does not seem to be much influenced by the pollination, since 77% of genes were active in non-pollinated papillar cells as well as in these cells after the pollination with compatible and incompatible pollen (Matsuda et al. 2014).

Another step forward represented the comparison of non-pollinated and pollinated pistils studied at both transcriptomic and proteomic levels. Proteome differences caused by pistil pollination were studied in Glycine max (Li et al. 2012), Oryza sativa (Li et al. 2016), Prunus armeniaca (Feng et al. 2006), Solanum pennellii (Chalivendra et al. 2013) and, interestingly, also in basal angiosperm Liriodendron chinense (Li et al. 2014). Transcriptomic studies comprised Arabidopsis thaliana (Boavida et al. 2011), Olea europaea (Carmona et al. 2015; Iaria et al. 2016), Oryza sativa (Li et al. 2016), Citrus clementina (Caruso et al. 2012) and the direct comparison of self-compatible Solanum pimpinellifolium and self-incompatible Solanum chilense (Zhao et al. 2015). These studies confirmed that not only pollination but also cross- versus self-pollination induced novel gene expression and protein synthesis.

For the characterisation of proteins directly involved in male–female interactions, it became necessary to analyse proteins secreted from both main players (see also Chap. 8). The comparison of apoplastic proteins isolated from Arabidopsis thaliana mature pollen and pollen tubes cultivated in vitro for 6 h enabled the identification of 71 novel proteins expressed after pollen germination. Of them, 50 proteins were secreted; they were involved in cell wall modification and remodelling, protein metabolism and signal transduction (Ge et al. 2011). This study provided the first insight into pollen-secreted proteins functioning in pollen germination and pollen tube growth. However, the use of DIGE limited the number of identified proteins. More importantly, it was already shown that the contact with female tissues significantly changed and enriched pollen tube gene expression (Qin et al. 2009; Lin et al. 2014). Therefore, it became necessary to evaluate the situation in vivo or semi-in vivo. Recently, such secretomes were published for two related species. Proteomic analysis of Solanum chacoense ovule exudates isolated by tissue-free gravity-extraction method enabled the identification of 305 ovule-secreted proteins, 58% of which appeared to be ovule specific (Liu et al. 2015). Similarly, gel-free proteomics was used to characterise the secretome of Nicotiana tabacum semi-in vivo cultivated pistil-activated pollen tubes (Hafidh et al. 2016b). Here, 801 proteins were identified with high frequency of small proteins <20 kDa. Interestingly, the majority (57%) of pollen tube-secreted proteins lacked signal peptide and were shown to be secreted unconventionally. This study not only highlighted a potential mechanism for unconventional secretion of pollen tube proteins but also indicated their potential functions in pollen tube guidance towards ovules for sexual reproduction. Hafidh et al. (2016b) demonstrated that the knockdown of unconventionally secreted translationally controlled tumour protein (TCTP) in Arabidopsis thaliana pollen tubes caused their poor navigation to the target ovule and low transmission of the mutant allele through the male. Unconventional protein secretion was described also in the ovules (Liu et al. 2015) but to a much smaller extent. The combination of both datasets, although obtained from different species, represents a significant contribution to our current efforts for dissecting possible mechanism for cell–cell communication between the pollen tube and female reproductive cells.

6 Conclusion and Perspective

The introduction of -omics techniques brought a very notable insight into the research of male gametophyte development, its dynamics and regulation. The past two decades and recent few years in particular faced the outburst of novel -omics techniques. However, transcriptomics still represents the main source of information and usually the first choice mainly due to its robustness, established infrastructure and data processing pipelines. Both microarrays and RNAseq offer big gene coverage, but the microarrays depend on a priori knowledge of sequence information and cannot display, for instance, splicing variants and/or shortened or altered transcripts. Although these issues are circumvented by deep sequencing technologies, the majority of data was achieved by the microarrays. That is why mostly male gametophytes of model plants were studied and only a limited number of alternative species have a known pollen transcriptome. Nevertheless, no transcriptomic technique solved the issue of the correlation of the abundance of transcripts to corresponding proteins. Therefore, proteomics is considered more reflecting gene regulation since it proves the presence of a protein. The original in-gel techniques showed limited gene coverage since they usually identified no more than 1000 proteins with a little overlap between independent experiments. On the other hand, because proteomics does not rely on the knowledge of genomic sequences, male gametophyte proteomes of non-model plants were studied. The versatility of proteomic approaches also enabled specialised studies like specific sets of proteins (membrane proteome, secretome, allergome) as well as functional and regulatory post-translational modifications, usually on limited samples only. Of them, phosphoproteome was studied in male gametophyte, namely, in three angiosperms—Arabidopsis thaliana, tobacco and maize. In tobacco, phosphoproteome dynamics during pollen activation was studied. In order to evaluate the regulation of early phases of pollen activation in bi- and tri-cellular species, it would be interesting, although challenging, to perform similar experiments on the tri-cellular species to acquire properly activated pollen, but the novel information brought by these approaches is definitely worth the effort.

In spite of the growing list of proteins with known function, pollen -omics revealed many proteins with unknown function or unclear classification that will surely deserve a further investigation by the subsequent functional studies. These unknowns likely represent pollen-specific proteins or proteins with a notable function related to male gametophyte. Integration of phylogenetic comparative methods with the analysis of genomic data enables to design experiments and generate new insights concerning the origin and structure of the genomes. The phylogenomic approach based on sequence similarities can identify gene duplications—orthologs vs. paralogs, infer evolutionary rate variation among taxa and separate sequence convergences from shared origins. The prediction of gene function can be improved by incorporating the evolutionary history of the genes themselves and reconstructing their historical sequence and function using a phylogenetic framework. However, the majority of available plant genome sequences originate from crop plants that make deeper phylogenomic analyses still unfeasible. In the future, with more data available, the comparison of gene families among species from a phylogenetically different group of plants will allow a comprehensive study of pollen-specific genes and gene family evolution in plants.

Perhaps the most interesting trend in modern work has been a move towards synthesis. During the past two decades, various high-throughput plant -omic studies have revealed a boom due to technological advancement. Numerous articles focused not only on model plants but also other organisms. However, some of these analyses were carried out with expertise on the bioinformatics field, but with minimal biological relevance. The most interesting part of the research started with integration of data from various resources as well as the combination of these two fields.