Introduction

The production of pollen as a complex process requires the coordinated participation of various cell and tissue types. Pollen mother cells produce tetrads of haploid microspores through meiosis, and the microspores (male gametophyte) released from these tetrads become mature pollen via pollen mitosis I (PMI) and pollen mitosis II (PMII) and can then be used for fertilization (Bedinger 1992). The microspore corresponds to a precise stage in the development of the male gamete in higher plants. Microspore development is closely linked with the sporophyte tissue of the male organ, also known as the tapetum, which contributes to the development of the external wall in microspores, which begins after the microspores are released from tetrads. The contribution of the external part has been well studied through genetic studies of many male sterility genes. However, the internal contribution to microspore development is rarely understood because separating the pollen during this stage is very difficult. High-throughput transcriptome data using microarray technology have provided a global understanding of pollen development in rice and Arabidopsis (Becker et al. 2003; Honys and Twell 2003, 2004; Suwabe et al. 2008; Fujita et al. 2010; Wei et al. 2010; Aya et al. 2011). In previous studies, Honys and Twell (2004) and Wei et al. (2010) demonstrated that dynamic changes in the transcriptomes of pollen in Arabidopsis and rice according to developmental progress. Interestingly, the uni-nucleated microspores showed a more similar profile to that of bi-nucleated pollen compared to that of the others, while profiles between mature pollen grains (MPG) and germinating pollen grains (GPG) were more conserved than those of the others (Wei et al. 2010). Furthermore, a microarray technique combined with laser microdissection (44 K LM-microarray) was used to independently identify the transcriptomes of the microspore/pollen and tapetum in rice (Suwabe et al. 2008). Subsequently, 140 male gamete- and tapetum-specific transcripts were identified. Although there was high specificity of identified transcripts, the number of transcripts was quite small because LM microarray technology uses a tiny amount of mRNA as a template and multiple rounds of amplification processes, which can restrict the recovery of endogenous mRNA profiles, as shown in root LM microarray experiments (Suwabe et al. 2008; Takehisa et al. 2012).

A public microarray database, the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/gds), has been collecting microarray data of samples from developing anthers and pollen in rice (Fujita et al. 2010; Deveshwar et al. 2011; Wei et al. 2010). However, it is not yet know how well the transcriptome data for similar anatomical samples from different sources are conserved or diversified. Systematic analysis of similar anatomical transcriptome data available in public databases might provide new insight that has not been provided by individual reports. Especially, the transcriptome data for male gamete development conserved between japonica and indica cultivars are very important for expanding the general understanding of the developmental process.

Combination of meta-expression data and the GUS reporting system is a very effective way to provide novel promoters to manipulate tissue-specific or stress-related traits (Thilmony et al. 2009). Recently, we reported the activity of the promoters of 11 mature pollen-specific genes, OsLPS1 to OsLPS11, in rice. Seven of the these promoters were also operated in a heterologous system using Arabidopsis (Nguyen et al. 2015; Oo et al. 2014). One promoter, OsLPS10 which was showed expression at bi-nucleated microspore stage, was successfully used for the complementation of Arabidopsis gametophytic mutant which was defective in microspore development prior to PMI (Nguyen et al. 2015).

In addition, a number of rice promoters confirmed by the GUS reporting system were evaluated using meta-expression data, indicating that meta-expression profiles based on a large collection of data can be useful sources of novel promoters (Jeong and Jung 2015). Microspore-specific promoters can be an ideal tool in plant breeding, especially in the field of rice hybrid breeding. Male sterile lines can be generated by producing cytotoxin or altering hormone levels using microspore-specific promoters (Oldenhof et al. 1996; Bae et al. 2010). Thus, the discovery of new early microspore-specific promoters that have been well characterized will provide us more options for manipulation of target traits in plants.

Here, we reported the global identification of rice microspore preferential (RMP) genes that are separated from sporophyte anther tissue using meta-expression analysis of anther and pollen samples collected in the public database. In total, 410 RMP genes were identified, but the functional characterization for relating developmental processes was quite limited compared to those showing rice pollen mother cell or tapetum-preferred (RPMCT) genes (263 genes). Among RMP genes, only two have been functionally studied through loss of function approaches, while 28 RPMCT genes have been functionally characterized, nine of which are related to pollen development or sterility. GO analysis revealed that the pyrimidine nucleotide biosynthetic process is very important for microspore development in rice. GUS activity under the control of a RMP gene promoter was used to confirm the tissue-specificity on rice microspore/pollen, suggesting that multiple promoters can be used to manipulate the agronomic traits associated with microspore development.

Results

Meta-expression analysis for genome-wide identification of RMP genes in rice

To identify the microspore (early pollen)-preferred genes that are conserved among japonica and indica rice cultivars, we downloaded and analyzed six series of Affymetrix rice microarray data prepared from developing anthers and pollen grains from NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/) (Table S1) (Barrett et al. 2009; Chandran and Jung 2014). The samples were arranged in order according to the process of anther and pollen development. As a control, all of the Affymetrix anatomical meta-expression profiles in the Rice Oligonucleotide Array Database (ROAD; http://www.ricearray.org/expression/meta_analysis.shtml) except for two anther samples were used to analyze the expression patterns in other tissues/organs (Cao et al. 2012). Afterward, we performed K-means clustering (KMC) analysis using the Euclidean distance matrix (EDM) and grouped 57,382 probes into 49 clusters based on expression pattern. As a result, we identified 789 probes from two clusters (42 and 45) (Fig. S1). Further hierarchical clustering analysis using the EDM refined the gene list to 410 genes that showed uni-nucleated (microspore)-preferred expression patterns in both cultivars (Fig. 1; Table S2). In addition, we identified that 263 RPMCT genes include well-known tepetum preferred genes such as tapetum degeneration retardation 1 (TDR1), eternal tapetum 1 (EAT1)/DTD, undeveloped tapetum 1 (UDT1), defective tapetum cell death 1 (DTC1), and persistent tapetal cell 1 (PTC1) and well-known pollen mother cell preferred genes such microspore and tapetum regulator 1 (MTR1), meiosis arrested at leptotene1 (MEL1) and ZIP4 (Table S3) (Jung et al. 2005; Komiya et al. 2014; Niu et al. 2013; Shen et al. 2012; Tan et al. 2012; Yi et al. 2016). Unlike RMP genes, we did not observe significant expression in developing pollen samples, indicating that these genes have major roles in male sporophytic tissues (Fig. 1). Candidate genes are summarized in Supplemental Tables 2 and 3.

Fig. 1
figure 1

Heatmap expression analysis of RMP and RPMCT genes. Expression patterns conserved among indica (brown box) and japonica (green box) types over several stages of development. The number of microarray data is 22 for indica rice (five stages for anthers and three for pollen) and 42 for japonica rice (eight stages for anthers and five for pollen). Anatomical meta-expression data from ROAD were used to analyze the expression patterns in other tissues. ACF archesporial cell-forming stage, BG bi-nucleated gametophyte stage, Fl flowering stage, GP germinating pollen, Me meiotic stage, Me1 meiotic leptotene stage, Me2 meiotic zygotene–pachytene stage, Me3 meiotic diplotene-tetrad stage, MP mature pollen stage, PMe pre-meiosis, TG tri-nucleated pollen stage, UG uni-nucleated gametophyte stage. The yellow color in the heatmap indicates high level of expression; dark-blue low expression

Evaluation of functions of RMP genes in the literature

Recently, the overview of functionally characterized genes in rice online database (OGRO, http://qtaro.abr.affrc.go.jp/ogro) summarized the rice genes that have been genetically characterized (Yamamoto et al. 2012). Of our candidate genes preferentially expressed in early pollen, we found that the functions of two genes were characterized through loss of functions. Of them, cysteine protease 1 (OsCP1) is involved in pollen development (Lee et al. 2004), but OsPHO1;2 regulates the translocation of phosphate from the root to the shoot (Secco et al. 2010). The former has a function which is associated with anatomical expression pattern, but the latter showing differential expression patterns under phosphate starvation is related to a condition-dependent function. In the latter, pollen development under phosphate starvation might be affected and should be addressed by further studies. The analysis for known genes associated with microspore development showed that our candidate genes might have morphological function or stress responses associated with microspore development.

Unlike RMP genes, we found that the functions of 28 RPMCT genes have been elucidated, indicating that pollen mother cell or tapetum development have been actively studied in rice due to the agronomic significance associated with male sterility (Table 1). Of them, 21 genes showed defects in the identity/formation of anthers and floral organs, confirming that tissue-preferred expression patterns are useful guidance for molecular and genetic characterization of the related gene function.

Table 1 Summary of known genes out of RPMCT and RMP genes

Biological processes specific to RMP genes using GO enrichment analysis and comparison with those of RPMCT genes

We conducted a Gene Ontology (GO) enrichment analysis in ROAD (Cao et al. 2012) in order to query 410 RMP genes. Our objective was to examine GO enrichment within the category of ‘biological process’ with the ROAD GO enrichment tool (http://www.ricearray.org/analysis/go_enrichment.shtml). As a result, we identified 222 GO terms assigned to 132 genes (Table S4). The majority of RMP genes (67.8 %) had limited information on gene functions based on GO. Significant terms in that category were selected with hypergeometric p values ≤0.05 and enrichment values of at least twofold (Table S4; Fig. 2). Of these, 12 GO terms were over-represented in RMP genes. Significantly enriched terms were found for biological processes corresponding to the pyrimidine nucleotide biosynthetic process (50.9 GO fold-enrichment value), ubiquitin-dependent protein catabolic process (10.4), chitin catabolic process (9.6), cell-matrix adhesion (9.1), cellular amino acid metabolic process (8.8), tRNA aminoacylation for protein translation (7.4), lipid transport (5.0), nucleosome assembly (3.8), ion transport (3.6), response to freezing (2.6), carbohydrate metabolic process (2.1), and translation (2.1) (Fig. 2a). The most abundant terms were found for ‘response to freezing,’ representing 24 genes. This was followed by ten genes for translation and nine for ubiquitin-dependent protein catabolic process (Fig. 2a).

Fig. 2
figure 2

GO enrichment analysis of RMP and RPMCT genes. a GO enrichment analysis of RMP genes. b GO enrichment analysis of RPMCT genes. X axis indicates the names of the GO terms, and the Y axis indicates the fold enrichment value. All enriched GO terms were selected under higher than twofold enrichment values and lower than 0.05 hypergeometric p values

To compare the transcriptome expressed in tapetal tissue and pollen mother cell but less expressed in early pollen, we also conducted GO enrichment analysis in biological processes. As a result, we identified 31 GO terms assigned to 137 of 263 genes, finding that the functions of 52.1 % of RPMCT genes can be estimated using GO in the biological process category (Fig. 2b), whereas the others need further investigation. In total, 13 GO terms were over-represented in RPMCT genes. Significantly enriched terms were found for biological processes corresponding to alcohol metabolism (83.5 GO fold-enrichment value), flower development (30.7), cell differentiation (23.4), negative regulation of translation (14.6), gene silencing by RNA (11.7), multicellular organismal development (8.6), ubiquitin-dependent protein catabolic process (6.6), fatty acid biosynthetic process (5.3), cell wall organization (4.4), lipid metabolic process (4.2), lipid transport (4.1), protein ubiquitination (3.9), and regulation of transcription (2.4) (Fig. 2b). The most abundant terms were for ‘regulation of transcription,’ representing 23 genes. This was followed by eight genes for lipid metabolism (Fig. 2b).

Compared to pollen mother cell or tapetum development, microspore development might require pyrimidine nucleotide biosynthesis, nucleosome assembly, cellular amino acid metabolism, protein translation, chitin catabolism, cell-matrix adhesion, ion transport, carbohydrate metabolic process, and response to freezing and translation, while pollen mother cell or tapetum development are more likely to be associated with alcohol metabolism, negative regulation of translation, regulation of transcription, cell differentiation, fatty acid biosynthesis, protein ubiquitination, lipid metabolism, cell wall organization, and multicellular organismal development. Lipid transport, and ubiquitin-dependent protein catabolism are common in both RMP and RPMCT genes, suggesting cooperative features of the tapetum, pollen mother cell, and male-gamete in microspore development.

MapMan analysis of RMP genes compared with RPMCT genes

MapMan allows one to group genes into different functional categories and visualize data through diagrams (Jung and An 2012). To classify RMP (410, green in Fig. 3) and RPMCT (263, red in Fig. 3) genes, we first analyzed the overall overview in the MapMan tool kit. As mentioned in the GO analysis, most of the RMP genes do not have an assigned MapMan term, indicating that most of the candidate genes showing microspore-preferred expression patterns have been less studied than RPMCT genes. Cell wall, metabolism of lipid, secondary metabolite and hormone, miscellaneous function, and RNA MapMan terms are strongly associated with the pollen mother cell or tapetum (Fig. 3; Fig. S2). On the other hand, protein, DNA and nucleotide metabolism MapMan terms are closely related to RMP genes (Fig. 3; Fig. S2). Especially, over-representation of DNA and nucleotide metabolism indicates that microspore development includes active cell division. Over-representation of protein terms in RMP genes indicates its significance in microspore development (Fig. 3). Especially, the ubiquitin and autophagy-dependent degradation overview indicates that three ubiquitin proteins, one E2 ligase, one HECT E3 ligase, seven RING E3 ligases, ten components in SCF E3 ligase complexes, 25 components in BTB E3 ligase complexes, and one component of proteasome are RMP genes (Fig. S3; Table S5), suggesting cooperative function of ubiquitin, E2 ligase, E3 ligases, and proteasome to support microspore development through protein degradation from the tapetum or anther locules. BTB E3 ligase complexes are only identified in RMP genes, suggesting unique roles of BTB E3 ligase complexes in microspore development in rice. In addition, microspore development associated with the protein MapMan term includes the protein translation pathway, as indicated in GO enrichment analysis (Fig. S4). RMP genes for the translation process retain two ribonucleases for RNA processing, two t-RNA ligases, one 60S subunit of ribosomal protein, one for protein synthesis initiation, one for protein elongation, 55 associated with protein degradation, five with protein modification, and one protein for mitochondrial targeting (Table S5). All of these might work together for protein synthesis during microspore development.

Fig. 3
figure 3

MapMan analysis of RMP genes. The overall overview was analyzed with 410 RMP genes and 263 RPMCT genes. Red boxes in these overviews indicate RMP, and green boxes indicate RPMCT genes. Detailed information about the overview is shown in Table S5

Expression patterns of microspore-preferred genes were confirmed through the GUS reporter system

To verify the microspore-preferred expression patterns of our candidate genes, eight candidate genes were selected (Fig. 4b; Table 2). Using PCR reaction with specific primers to amplify the promoter regions of the eight candidate genes, the promoter fragments of eight RMP genes ranging in size from 725 to 2053 bp were generated (Table 2). The promoter fragments were fused to GFP-GUS reporter genes to create the proRMP::GFP-GUS constructs. As shown in Fig. 4a, eight vector constructs were prepared and then transformed into rice callus mediated by Agrobacterium tumefaciens LBA4404 strains (Nguyen et al. 2015). As a result, in total, 316 putative transgenic lines were generated with eight constructs.

Fig. 4
figure 4

Heatmap expression analysis of eight RMP genes for the GUS reporter system and schematic diagrams of promoter-GUS fusion constructs. a Vector maps of the eight promoter-GUS fusion constructs. b Heatmap expression data of eight RMP genes selected for the GUS reporter system

Table 2 List of candidate genes to identify promoters driving microspore preferred expression using meta-expression data

Results of the GUS assay indicated that all eight promoters exhibited very similar GUS expression patterns in pollen grains and anthers, but they were distinct in vegetative organs (Fig. 5a, b). Of them, the promoters of RMP2, RMP7, and RMP8 showed pollen-preferred expression patterns but did not display the GUS signal in leaf, stem, or root tissues (Fig. 5b). On the other hand, promoters of RMP1, RMP4, and RMP6 showed pollen-preferred expression patterns and also exhibited a GUS signal in leaf, stem, and root tissues. In comparison, the promoters of RMP3 and RMP4 showed a GUS signal in leaf and stem tissues but not in the roots, indicating the possible existence of an additional regulatory mechanism different from the microspore-specific genes (Figs. 3, 4). Expression of the other five promoters in other organs/tissues demonstrated unexpected results, possibly caused by wounding and submergence stresses coupled with the GUS assay method. This possibility needs to be tested with meta-expression data under abiotic stress conditions or by analyzing the expression patterns of these genes under wounding or submergence and comparing the results with those in the untreated condition.

Fig. 5
figure 5

Validation of RMP genes using GUS reporter systems and RT-PCR analyses. a Images of flower and pollen from transgenic plants harboring RMP promoter::GUS reporter constructs at sequential stages of pollen development were prepared after GUS staining. b Images of leaf, stem, and leaves from transgenic plants at the vegetative growth stage were also prepared. Promoters from eight RMP genes in Table 2 were used to control GUS expression. UG uni-nucleated gametophyte stage, BG bi-nucleated gametophyte stage, MP mature pollen stage. GUS expression patterns were analyzed from promoter-GUS transgenic lines. Bar in flower image = 0.5 mm; and bar in pollen image = 10 μm. c Real-time RT-PCR analysis of eight genes shown in Fig. 4b. Quantitative real-time PCR analysis was performed for various organs. Transcript levels were normalized to rice UBIQUITIN5 and calculated using the comparative cycle threshold method. Error bars indicate standard deviation (sd). Y axis, relative expression level to rice UBIQUITIN5; X axis, sample names used for analyses. Sh 7-day-old shoots, Ro 7-day-old roots, M/T anthers at meiosis and tetrad, YM anther at young microspore stage, VP anther at vacuolated pollen stage, and MP anther at mature pollen stage

In order to determine GUS signals in the microspores, we first performed DAPI staining for isolated pollen grains at different developmental stages and assigned the pollen to uni-nucleated, bi-nucleated, tri-nucleated, and mature pollen categories. The results indicated that significant levels of GUS signals were first detected at the uni-nucleated microspore stage and were gradually accumulated through the bi-nucleated and tri-nucleated stages, while ultimately reaching the highest level in MPG (Fig. 5a). Due to the accumulating feature of GUS signals, the expression at the tri-nucleated and mature pollen stages might be observed. The patterns of microspore-preferred expression by these genes, as shown in Fig. 2, were confirmed by RT-PCR analysis (Fig. 5c). Of them, RMP1 and RMP8 showed the highest expression in bi-nucleated microspore, while the next highest expression was in young microspores. However, the others showed the highest expression patterns in young microspores, and expression in the seedling roots and shoots was very low, like that in the microarray data. All of our results demonstrated that these meta-expression data for genes preferentially expressed in microspores are highly reliable and are a valuable source of novel promoters for controlling traits associated with pollen maturation.

Analysis of the RMP promoter sequences

The promoter sequences of the RMP1 through RMP8 genes were analyzed using a signal scan search of Plant cis-acting Regulatory DNA Elements (PLACE) program (Higo et al. 1999). The translation start site was counted at +1, and the locations of cis-regulatory elements (CREs) were highlighted for each promoter (Fig. S6). The scanning results revealed numerous CREs in the promoter region of the RMP genes. The functions of identified CREs are grouped and shown in Tables S6 and S7. Among them, the six most abundant CREs detected in RMP promoter regions are listed in Table S6: ACGTATERD1 (ATGC), ARR1AT (NGATT), CAATBOX1 (CAAT), GATABOX (GATA), MYBCORE (CNGTTR), and DOFC OREZM (AAAG). The copy number for each of those CREs is shown in Fig. S7. In detail, we found 16 copies of ACGTATERD1 in RMP4, 12 copies in RMP1, 8 copies in RMP2, RMP3, and RMP6, and 4 copies in RMP7 and RMP8 promoters. For ARR1AT, 26 copies were found in RMP2, whereas the others showed lower copy numbers for this motif as follows: RMP8 (22), RMP4 (21), RMP7 (18), RMP5 (14), RMP1 (12), RMP3 (4). In addition, a high copy number of CAATBOX1 was revealed in RMP2 (34). The other promoters possess fewer: RMP8 (22), RMP1 (18), RMP4 (16), RMP7 (12), RMP5 (11), RMP6 (7), RMP3 (5). We also found 30 copies of GATABOX in RMP1, 20 in RMP4, and 15–17 in RMP5, RMP7, and RMP8. Six copies of MYBCORE were found in the RMP3, RMP5, RMP8 promoters, while four copies were found in RMP1, RMP2, and RMP4. The RMP6 and RMP7 promotors possessed three and five copies, respectively. Futhermore, high copy numbers of DOFCOREZM were detected in the RMP promoters: RMP4 (36), RMP2 (35), RMP5 (33), RMP8 (31), RMP7 (24), RMP1 (15), RMP3 (10), and RMP6 (11). In addition, we analyzed average frequency of these CREs from 211 RMP genes with 2Kb promoter region, and found that ARR1AT, CAATBOX1, and DOFCOREZM have slightly higher portion than the average estimated value of the frequency and the others have slightly lower values (Table S2); ARR1AT has 1.21 (=19.3/16); CAATBOX1, 1.14 (=18.3/16); and DOFCOREZM, 1.27 (=20.3/16). These results indicate that three CREs with increased frequecy might have significant roles in trasncriptinal regulation for early pollen development in rice.

Moreover, 11 CREs related to organ/tissue-preferred expression are listed in Table S7. Of them, anther- or pollen-specific CREs such as GTGANTG10 (GTGA), POLLEN1LELAT52 (AGAAA), SITEIIATCYTC (GNATATNC), 5659BOXLELAT5659 (GAAWTTGTGA) were identified. The results of scanning showed 10–17 copies of GTGANTG10 in the promoter regions of RMP1, RMP2, RMP4, RMP6-8, and 5 and 8 copies in RMP3 and RMP5 were found respectively. Similarly, 13–15 copies of POLLEN1LELAT52 were found in RMP2, RMP4, RMP5, and RMP8. The other promoters had two to eight copies. The SITEIIATCYTC element was found in RMP3 (4), RMP6 (5), and RMP8 (3) but not in the others. One copy of the 5659BOXLELAT5659 element was revealed in the RMP6 promoter. In addition, mesophyll- and meiocyte-preferred CACTFTPPCA1 (YACT) and four root-preferred P1BS (GNATATNC), OSE1-/OSE2-ROOTNODULE (AAAGAT), RHERPATEXPA7 (KCACGW), and ROOTMOTIFTAPOX1 (ATATT) were also identified. The function and frequency of these CREs are listed in Table S7 and Figure S7. Similary to those for highly abundant CREs among selected eight RMP genes, we analyzed the average frequency for ten CREs besides EBOXBNNAPA related to organ/tissue-preferred expression and compared the frequency with the estimated value (Table S2). As a result, we found that four of ten CREs have increased frequency than the estimation; POLLEN1LELAT52 showed 2.23 fold increase (=8.9/4); 5659BOXLELAT5659, 3462.61; OSE1-/OSE2-ROOTNODULE, 1.7 (=1.7/1); and ROOTMOTIFTAPOX1, 2.3 (=9.2/4). These results indicate that four CREs related to pollen- or root-preferred expression with higher fold increase might have more significant roles in transcriptional regulation for early pollen development. In addition, we expect that CREs related to pollen- or root-preferred expression might contribute more than those of highly abundant CREs.

Discussion

Majority of RMP genes are monocot unique genes

For comparative analysis of RMP-preferentially expressed genes between rice and Arabidopsis, we searched for orthologs between rice and Arabidopsis with a database of orthologous groups among rice, Arabidopsis, Brachypodium, maize, Populus, Vitis vinifera, and Sorghum bicolor from the Rice Genome Annotated Project (RGAP; http://rice.plantbiology.msu.edu/annotation_pseudo_ortho.shtml). Subsequently, 25 Arabidopsis orthologue for 410 rice RMP genes identified (Table S9). Majority of RMP genes are monocot unique genes, informing the significance of functional genomic studies for our candidate genes to understand male gamete development at very early stage. Among the 25 Arabidopsis orthologues, there is a report informing that two genes, AT1G31740 and AT5G20710, were preferentially expressed in microspore (Hrubá et al. 2005). However, we did not find any information on the other Arabidopsis orthologues which are active in Arabidopsis young microspores. In addition, we performed rice orthologue search for Arabidopsis DUO1/2/3 and GEX1/2, and analyzed expression patterns of identified the orthologues. There is no probe in Affymatrix array for DUO1 orthologue (LOC_04g46384); DUO3 orthologue (LOC_03g46950) was expressed in most of analyzed tissues/organs; and GEX1 (LOC_09g27040) and GEX2 (LOC_09g25650) were strongly expressed in mature pollen (Fig. S8). Because rice orthologues for DUO3, GEX1 and GEX2 did not exhibit microspore preferential expression in rice and so we did not include them in our candidate gene list. Similar observations were reported in Arabidopsis (Brownfield et al. 2009; Alandete-Saez et al. 2011; Alandete-Saez et al. 2008).

Role of abundant CREs in the promoters of RMP genes

Analyzing upstream promoter regions (725–2053 bp) originated from eight RMP genes using a Signal Scan revealed various CREs. Of them, the ten most abundant were found in all promoters (Table S6; Fig. S6). ACGTATERD1 (5′-ACGT-3′), the ACGT core sequence, has been established as a functional important cis-element that frequently regulates gene expression in synergy with other cis-elements. The ACGT sequence has been found in the ERD1 promoter, required for etiolation-induced increase in LUC activity (Simpson et al. 2003). In addition, ACGT elements are responsible for gene regulation in response to exogenous stress in rice (Qiu et al. 2008; Mehrotra et al. 2013). The binding element ARR1AT (5′-NGATT-3′, where N = G/A/C/T) has been found in both Arabidopsis and rice. ARR1 and ARR2 are cytokinin response regulators that function as transcriptional activators (Sakai et al. 2000). All promoter regions of RMP genes contain a high copy number of ARR1AT (Fig. S6). In addition, a promoter consensus sequence named CAATBOX1 (5′-CAAT-3′) is responsible for the tissue-specific promoter activity that was functionally characterized in the promoter region of the pea legumin gene LegA (Shirsat et al. 1989). The abundance of these cis-acting elements in our RMP promoters suggests that they are basic components of microspore-preferred expression. GATABOX (5′-GATA-3′) is known to be required for high level, light-regulated, and tissue-specific gene expression. GATA transcription factors are a group of DNA-binding proteins distinguished by a zinc finger motif, which have been implicated in light- and nitrate-dependent transcription control (Reyes et al. 2004). The identification of zinc finger transcription factor genes showing the highest expression level in Arabidopsis sperm cells suggests the potential of this regulatory element for pollen-preferred expression (Borges et al. 2008). MYBCORE (5′-CNGTTR-3′) is a binding site for two plant MYB proteins, AtMYB1 and AtMYB2, which were isolated from Arabidopsis. AtMYB2 is involved in the regulation of genes responsive to water stress (Abe et al. 2003). Moreover, MYB-type transcription factors showing high expression in sperm cells suggest the potential involvement of this cis-acting element in pollen development (Borges et al. 2008). In the promoter regions of all RMP genes, MYBCORE was detected from three to six copies (Fig. S6). DOFCOREZM (5′-AAAG-3′) is the target binding site of Dof proteins, which are specific DNA-binding proteins associated with the expression of multiple genes in plants including monocots and dicots (Yanagisawa and Schmidt 1999). In higher plants, many Dof genes have been identified; for example, more than 37 putative genes encoding Dof domain proteins in Arabidopsis and about 30 Dof in rice have been identified (Yanagisawa 2004). The role of the Dof domain protein has been investigated, and it has been shown to be involved in response to stress, light, hormones, seed germination, tissue-specific expression, and in the leaves (Yanagisawa 2000, 2004). Moreover, binding sites for Dof transcription factors have been recorded in the upstream sequences of GEX1 and GEX2, two genes showing sperm-specific expression in Arabidopsis (Engel et al. 2005). High copy number of the DOFCOREZM element were detected in all eight RMP promoters (Fig. S6). Based on the analysis of abundant CREs in RMP promoters, we expect the roles of these CREs to involve basic development, stress response, and homorne response, as well as male-gamete specific function.

Role of pollen-specific CREs identified in promoters of RMP genes

Interestingly, scanning results revealed some CREs involved in anther/pollen-specific expression. For example, GTGANTG10, with sequence 5′-GTGA-3′, was found in all eight promoters with high copy number (5 to 18 copies, Fig. S7). This motif was found in the tobacco promoter conferring late pollen gene g10, which shows homology to pectate lyase and tomato gene Lat56 (Rogers et al. 2001). The presence of a GTGA motif in the rice anther-specific plant lipid transfer protein (OsLTP6) gene promoter might increase GUS expression in rice transgenic plants (Liu et al. 2013). Moreover, the POLLEN1LELAT52 element with sequence 5′-AGAAA-3′ is a known motif required for anther/pollen-specific expression in the tomato Lat52 gene (Bate and Twell 1998). This motif was detected at high copy number in promoter regions of all eight genes (Fig. S7). SITEIIATCYTC (5′-GNATATNC-3′) is a TCP-domain of protein-binding elements in anther- and meristem-specific expression (Welchen and Gonzalez 2005). This element was found only in RMP3, 6 and 8 with 4, 5, and 3 copies, respectively. 5659BOXLELAT5659 (5′-GAAWTTGTGA, W = A/T) is a sequence motif shared between the tomato Lat56 and Lat59 promoters (Twell et al. 1991). This motif is unique for RMP6 with one copy that might be required for male gamete- or tapetum-expressed genes (Hobo et al. 2008) (Fig. S7).

Although many CREs were identified in the promoter regions, only a few CREs have been shown to be involved in the regulation of anther/pollen-specific gene expression. In addition to anther/pollen-preferred CREs, we found that other organ-preferred CREs have the potential for anther/pollen-preferred expression. CACTFTPPCA1 (5′-YACT, Y = T/C) is a tetra-nucleotide motif responsible for mesophyll-specific gene expression of the C4 phosphoenolpyruvate carboxylase gene in C4 plants (Gowik et al. 2004). Recently, this element was also found to be very abundant in the promoter regions of 50 genes preferentially expressed in Arabidopsis male meiocytes (Li et al. 2014). In this study, CACTFTPPCA1 was found in all eight RMP promoters at high copy number compared with other CREs (Table S7; Fig. S7). P1BS (5′-GNATATNC-3′) involved in root-preferred expression was found in RMP4 and RMP7 with 6 and 4 copies, respectively. This element was reported in both monocot and dicot promoters as required for controlling root expression and responsiveness to phosphate deprivation of the phosphate transporter gene (Schunmann et al. 2004; Sobkowiak et al. 2012). Similarly, OSE1-/OSE2-ROOTNODULE (5′-AAAGAT-3′ or 5′-CTCTT-3′) is one of the consensus sequence motifs of organ-specific elements (OSE) characteristic of the promoters activated in the infected cells of root nodules and in the arbuscule-containing cells of mycorrhizal roots (Fehlberg et al. 2005). This CRE was found in all RMP promoters. The RHERPATEXPA7 (5′-KCACGW-3′) detected only in RMP4 and RMP8 is a root hair-specific cis element (Kim et al. 2006). ROOTMOTIFTAPOX1 (5′-ATATT-3′) is a motif found in the rolD promoter of Agrobacterium rhizogenes. The rolD-gus genes were found to have a distinctive expression pattern in roots (Eulgem et al. 1999). It has been reported that genes showing expression in the roots under stress or normal condition also exhibited anther- or pollen-preferred expression patterns, indicating that root-specific CREs identified in the promoters of these eight RMP genes have the potential to regulate microspore-preferred expression.

DNA synthesis is important for microspore development in rice compared to that in the tapetum

In the anther, the tapetum is the innermost of the four sporophytic layers of the anther wall and is in direct contact with the developing male gametophyte. The tapetum contains all of the nutrients for the development and maturation of the microspore (Wang et al. 2015). Single uni-nucleated microspores are then released to produce two daughter cells, a vegetative cell (VC) and a generative cell (GC) at the bi-nucleated stage. The VC exits the cell cycle at the G1 phase, while the GC undergoes further division to produce two sperm cells at PMII (tri-nucleated stage). During the cell cycle, cells are proliferated through four stages, including G1, in which the cell grows and the nucleus has 2C DNA content (where C is a value indicating the amount of DNA contained within a haploid nucleus); S where DNA replicates from 2 to 4C; G2, which is a second growth period; and the M phase that represents mitosis in somatic or germinal cells. Several studies have reported that DNA is synthesized in microspores from G1 until G2 phase of the cell cycle using the immuno-labelling of incorporated Bromo-deoxyuridine in Brassica (Binarova et al. 1993; Pretova et al. 1993). The maximal enrichment of the nucleotide biosynthesis process in RMP genes indicates that rice microspores actively produce DNA during mitotic cell division. Genes to which this GO term can be applied are potential targets to elucidate the mitotic cell division process.

RMP genes are new targets to understand male-gamete development

More than 410 genes preferentially expressed in the uni-nucleated stage and early bi-nucleated stage have been revealed by analyzing meta-expression data, but only two genes were functionally characterized (Lee et al. 2004; Secco et al. 2010). Moreover, GO analysis confirmed that 285 genes (67.9 %) did not have assigned GO terms. On the other hand, 28 of 263 RPMCT genes have been functionally characterized, and 52 % of them have assigned GO terms, indicating that the understanding of RMP genes is quite limited compared to that of RPMCT genes. To date, many genes that are expressed in anther or pollen tissues have been identified in rice. Based on expression patterns, they were grouped as pollen mother cell or tapetum-expressed genes including TDR, Osg6B, RA8, RTS, UDT1, carbon starved anther (CSA), OsLTP6, MTR1, MEL1 and ZIP4 (Yokoi et al. 1997; Jeon et al. 1999; Jung et al. 2005; Luo et al. 2006; Li et al. 2006; Zhang et al. 2010a; Shen et al. 2012; Liu et al. 2013; Niu et al. 2013; Komiya et al. 2014) and late pollen-expressed genes including OsSCP1, OsSCP2, OsSCP3, RIP1, OSIPA, OSIPK, OsLPS1-9, OsLSP10 and OsLPS11 (Park et al. 2006; Han et al. 2006; Gupta et al. 2007; Swapna et al. 2011; Khurana et al. 2013; Oo et al. 2014; Nguyen et al. 2015). However, a few rice microspore-specific or preferred genes such as OsCP1 and OsPH1;2 have been functionally studied (Lee et al. 2004; Secco et al. 2010). Of these, OsCP1 showed the expression patterns in both tapetum and young microspore through previous study using GUS reporter system (Lee et al. 2004) but the GUS activity was majorly detected in microspores and vacuolated pollen grains of OsCP1 promoter trap heterozygous line. Our data also inform that high expression levels in young anthers at microspore stages are mostly contributed by the expression in microspores, but we cannot ignore the contribution by tapetal tissue (Fig. S8). In addition, GUS expression patterns in developing anthers of the heterozygous line were observed later than those of typical tepetum preferred genes such as TDR or UDT1 (Lee et al. 2004). In case of OsPH1;2, the functions were not investigated in term of male gamete development (Secco et al. 2010), and so this gene might be an useful target to study the relationship between phosphate transport and male gamete development.

We compared our list with mature pollen preferred genes and sperm cell preferred genes selected by Russell et al. (2012). Interestingly, we found that 70 genes out of 410 RMP were overlapped with in sperm cell preferred genes identified from recent transcriptome analysis (Russell et al. 2012) (Table S10). Alteration of expression profile was reported during pollen development, expression profile for uni-nucleated microspore was similar to that for bi-nucleate pollen (r = 0.82) but not for mature and GPG (r = 0.13) (Wei et al. 2010). However, some genes might be needed for proper pollen development and continuously expressed in mature pollen as we identified 70 RMP genes in sperm cells of mature pollen, suggesting that functions of some RMP genes continue until mature pollen and germinating pollen stages (Russell et al. 2012). This comparison suggests that most of RMP genes besides a small portion of them might have unique roles in early pollen development.

The development of microspore and tapetum is closely linked, and many rice tapetem preferred gene promoters have been used to study male-sterility and programed cell death. However, knowledge regarding microspore development seems to be limited. The clear understanding of genes expressed in the microspore stage will provide more options to manipulate target genes in microspores or pollen. Our study analyzed eight RMP promoters that have been well characterized in rice. They might be very useful for future studies and practical application in other plants as well as rice.

Materials and methods

Collection of microarray data and identification of RMP genes

We used 64 publicly available rice Affymetrix microarray data prepared from anthers and pollen in NCBI GEO to identify RMP genes. For the comparative transcriptome analyses between rice and Arabidopsis, we downloaded Arabidopsis Affymetrix microarray data series GSE5630, GSE5631, GSE5632, GSE5633, GSE6162, GSE6696, GSE12316, GSE17343, and GSE27281. Intensity values for these six series were first normalized with the Affy package in the R program and then log2-transformed. Affymetrix anatomical meta-expression data with the averaged normalized data were then used for further investigations, e.g., KMC analysis, heatmap construction, and identification of RMP genes. Based on the microarray data, we selected eight candidate genes that were highly expressed in uni-nucleated microspores, designating them as rice microspore-preferred gene 1 (RMP1, LOC_Os01g34920) through RMP8 (LOC_Os07g46950) (Fig. 4a). Whole sequence located in the 5’ upstream region of gene from start codon (ATG) was predicted as its promoter. Full length of promoter regions of RMP1–RMP8 genes were amplified and cloned into pDONR201 donor vector and destination vector pKGWFS7 as described by Oo et al. (2014) to create RMP promoter::GFP-GUS constructs (Fig. 4b). These constructs were transferred into Agrobacterium tumefaciens strain LBA4404 and used to generate transgenic rice plants. The extension primers used are listed in Table S8.

Plant transformation and selection of transgenic plants

Sterilized rice seeds of Dongjin cultivar were cultured on N6D plate medium for 5 days in a growth chamber under conditions of 32 °C and continuous light. Pre-cultured seeds were transformed via Agrobacterium tumefaciens (LBA4404) as described by Nguyen et al. (2015). Pre-cultured seeds were inoculated with Agrobacteria suspension harboring proRMP::GFP-GUS for 2 min with gentle inversion. The seeds were then blotted on sterilized filter papers and co-cultivated on N6D-AS medium for 3 days in the dark at 28 °C. After 3 days of co-cultivation, the seeds were rinsed thoroughly with autoclaved distilled water containing 200 mg/L cefotaxime and placed on N6D-CG medium containing 250 mg/L cefotaxime and 50 mg/L geneticin (Duchefa, Haarlem, The Netherlands) for 4 weeks. Regenerated shoots were selected on REM-CG medium supplement with 250 mg/L cefotaxime and 50 mg/L G-418. Plantlets were transferred to soil and grown in a GMO field.

PCR analysis of transgenic plants

In order to confirm whether transformed plants contained inserted T-DNA, we performed verification by PCR analysis using primers specific to the inserted gene (Table S8). PCR reactions were conducted in conditions of 94 °C/15 s, 55 °C/30 s, and 72 °C/5 m with 35 cycles. PCR products were separated on 1 % agarose gel with a 1 kb ladder marker (Elpis Bio, Daejeon, Korea).

GUS and DAPI staining

GUS activity in vegetative and reproductive organs was performed as described by Han et al. (2006) with minor modifications. Anthers were prefixed in fixative solution (Ethanol 3: Acetic acid 1) overnight at room temperature and subsequently rehydrated through ethanol series, 75, 55 and 35 %. They were then incubated at 60 °C for 1 h in DAPI staining solution supplemented with 20 % methanol and 1.0 µg/mL of DAPI. Pollen grains were briefly centrifuged, transferred to microscope slides, and covered with a cover glass. To determine various stages of development (UC, BC, TC and MP stages), pollen grains were viewed by light and UV epi-illumination using a Nikon ECLIPSE 80i microscope (Nikon, Melville, USA). Pollen images were captured using a ProgRes MFcool camera (Jenoptik, Jena, Germany) at 40× magnification. After GUS staining, images of seedlings and other samples such as leaf, stem, and root were produced using a ProgRes C3 camera (Jenoptik, Jena, Germany) at 0.65× magnification.

MapMan analysis

The MapMan program allows the grouping of genes into different functional categories and visualization of data through various diagrams (Jung and An 2012). To obtain functional classifications, we uploaded RGAP locus IDs for 410 RMP genes and 263 RPMCT genes to the MapMan tool kit. We then investigated the (overall) overview of metabolism, RNA-protein synthesis, and ubiquitin-dependent protein degradation pathway installed in that kit (Fig. 3; Figs. S2–S4). All data are detailed in Table S5.

Promoter sequence analysis

Promoter sequences were analyzed using the PLACE database program (Higo et al. 1999) to detect putative cis-regulatory elements involved in pollen-specific expression. The multiple sequence alignment program (ClustalW2) was also used to compare the promoter sequences.

RNA isolation and RT-PCR analysis

Total RNA was extracted with Tri Reagent (MRC Inc., Cincinnati, OH, USA). For synthesis of cDNA, 1 mg of total RNA was reacted with M-MLV reverse transcriptase (Promega, Madison, USA), 2.5 mM dNTP, and 10 ng of oligo dT. To evaluate the expression patterns of the eight RMP genes that showed GUS activity, we prepared samples from the shoots and roots of 7-day-old seedlings and anthers at four different developmental stages. All primers for RT-PCR are listed in Table S8. The ubiquitin 5 gene (Ubi5) was used as an internal control.