Introduction

Plastids are plant organelles producing diverse array of compounds that perform important functions in the plastid and cell, and are also important for agriculture and human nutrition. Plastidic isoprenoid synthesis represents a major source of such compounds and produces an important group of lipid-soluble antioxidants in photosynthetic tissues, i.e., the carotenoids. Most of the carotenoids are C40 tetraterpenoids derived from phytoene and comprise a large family of more than 700 structures (Britton et al. 2004). They accumulate in nearly all types of plastids, including the chloroplast and chromoplast, and are thus found in most plant organs and tissues. In seeds, roots and dark-grown tissues, they are present in amyloplasts (in starchy seeds) or elaioplasts (oil seeds), leucoplasts and etioplasts, respectively. Non-oxygenated carotenoids are referred to as carotenes, whereas their oxygenated derivatives are designated as xanthophylls. The most commonly occurring carotenes are β-carotene in chloroplasts and lycopene in chromoplasts of some fruits and flowers. The most abundant xanthophylls (lutein, violaxanthin and neoxanthin) are key components of the light harvesting complexes (Woitsch and Römer 2003; Dall’Osto et al. 2006) and are present in great abundance in photosynthetic plant tissues. In plants, carotenoids play an important role in photosystem assembly, light harvesting and photoprotection (Kulheim et al. 2002; Lokstein et al. 2002; Holt et al. 2005; Dall’Osto et al. 2007), photomorphogenesis (Park et al. 2002; Howitt and Pogson 2006), non-photochemical quenching (Pogson et al. 1998; Franco et al. 2007), lipid peroxidation (Havaux and Niyogi 1999; Davidson et al. 2002; McNulty et al. 2007), and viability and aging of the seeds (Calucci et al. 2004). In recent years, there have been many evidences for the presence of carotenoid-derived signaling compounds regulating diverse aspects of plant development, for example apical dominance and branching (Beveridge et al. 2003; McSteen and Leyser 2005), root and shoot development (Van Norman et al. 2007) and rhizosphere signaling (López-Ráez et al. 2008). Carotenoid cleavage products (enzymatic and photooxidative derivatives) are precursors for the formation of plant hormone abscisic acid (Nambara and Marion-Poll 2005) and also provide important components of food, fragrance and cosmetics (Bouvier et al. 2003a; Bouvier et al. 2003b; Simkin et al. 2004a, b).

During the past decade, primarily as a result of molecular genetic and biochemical genomics-based approaches in the model organisms Arabidopsis and Synechocystis sp. PCC6803, a near complete set of genes required for the synthesis of these compounds were identified (Dellapenna and Pogson 2006). A number of breeding and transgenic approaches have thus been used to modify the types and levels of these important compounds in agricultural crops and have provided important insight into their regulation, activity and integration (Lu and Li 2008). The availability of the complete genome sequence for four angiosperm species, Oryza sativa (IRGSP 2005), Sorghum bicolor (Paterson et al. 2009), Populus trichocarpa (Tuskan et al. 2006) and Arabidopsis thaliana (AGI 2000) provides an opportunity to conduct genomic analyses in these model species, as it has been demonstrated that all these four species have undergone at least one round of genome-wide duplication (Bowers et al. 2003; Raes et al. 2003; Tuskan et al. 2006; Paterson et al. 2009). Comparative analysis of carotenoid biosynthesis genes will provide valuable insight into the complex origin, evolution and conservation of an important metabolic pathway, as many biosynthetic genes are already known to have homologs, resulting from gene duplication and sequence divergence. The results of this analysis would provide a framework for detecting conserved sequences having common functions, and expansion of genes resulting in non-, neo- or sub-functionalization. Rice endosperm is known to be completely devoid of carotenoids and a number of transgenic approaches have been followed to improve its carotenoid content (Ye et al. 2000; Paine et al. 2005). The expression profile of carotenoid biosynthesis genes in different photosynthetic and non-photosynthetic tissues could provide important insights into their observed presence and absence, respectively. In this study, we report the identification of carotenoid biosynthesis genes in rice and Populus, characterization of these genes in rice and their comparison with Arabidopsis and other known carotenoid biosynthesis genes. We have analyzed the gene structure and organization and duplication events that might have contributed to their expansion, while phylogenetic analysis was performed to study their evolutionary relationships. We also analyzed the expression of rice genes in different tissues/developmental stages and under different abiotic stress conditions, using quantitative PCR, microarray and signature-based expression profiling.

Materials and methods

Identification of carotenoid biosynthesis genes in rice and Populus

To identify the complete set of carotenoid biosynthesis genes in rice (O. sativa), a BLAST search of all the annotated proteins in the rice genome at TIGR (release 6) was performed using HMMER program (version 2.3.2), which is based on hidden Markov models (HMM) (Durbin et al. 1998). This program utilized the HMM profiles generated by alignments of known carotenoid biosynthetic proteins, from A. thaliana and other plants species, as query. In all, ten HMM profiles were generated to identify proteins similar to phytoene synthase (PSY), phytoene desaturase (PDS), ζ-carotene desaturase (ZDS), carotenoid isomerase (CRTISO), lycopene β-cyclase (βLCY), lycopene є-cyclase (єLCY), β-carotene hydroxylase (βOH), є-carotene hydroxylase (єOH), zeaxanthin epoxidase (ZE) and violaxanthin de-epoxidase (VDE) enzymes in rice. In the results of the HMMER search, all proteins with score >500 were taken; proteins corresponding to different gene models present at the same locus were removed and the remaining ones were studied by alignment and conserved domain analysis of protein sequences.

The protein sequences of rice carotenoid biosynthesis genes were used as a query sequence to search against the Populus genome database (http://genome.jgi-psf.org/poptr1/Poptr1.home.html) using the BLASTP algorithm (Altschul et al. 1997). The newly identified genes were used reiteratively to search the same sequence database. All the significant hits in BLAST result were taken and analyzed by alignment with known genes and presence of conserved domains. TBLASTN searches were performed to identify the genes in the genome that may not have been annotated.

Sequence analysis

Multiple sequence alignments to identify the conserved regions were done using ClustalX (version 1.83; Thompson et al. 1997) and, for phylogenetic analysis, ClustalX (version 1.83) and MEGA (version 4.0, Tamura et al. 2007) programs were used to construct neighbor-joining (NJ) and maximum parsimony (MP) trees, respectively, by generating 1,000 random bootstrap replicates. The phylogenetic trees were displayed using TreeView (version 1.6.6, http://www.taxonomy.zoology.gla.ac.uk/rod/treeview.html). DNA and protein sequence analysis were performed using Gene Runner program (version 3.04, http://www.genenames.com). Domain search program SMART (http://smart.embl/heidelberg.de/) was used to deduce the domain structure of putative protein sequences. Analysis of putative protein sequences was carried out using ChloroP 1.1 program (http://www.cbs.dtu.dk/services/ChloroP/) to identify the presence and length of putative chloroplast transit peptide. To obtain information on the intron/exon structures, cDNA sequences of carotenoid biosynthesis genes were aligned with their corresponding genomic sequences. Automatic annotation is sometimes known to be associated with potential errors (Rouze et al. 1999), hence the gene structures were also studied by refined analysis of intron–exon positions and by comparing gene models with the cDNA and/or EST sequences wherever the information was available.

Localization on rice chromosomes and segmental duplication

Each of the carotenoid biosynthesis genes was positioned on rice chromosome pseudomolecules available at TIGR (release 5) (http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml) and IRGSP (build 05) (http://rgp.dna.affrc.go.jp/E/IRGSP/Build5/build5.html) by BLASTN search. Segmental duplication of rice genes available at TIGR (http://www.tigr.org./tdb/e2k1/osa1/segmental_dup/index.shtml) was used to determine the presence of carotenoid biosynthesis genes on duplicated chromosomal segments, with the maximal length distance permitted between collinear gene pairs of 100 kb.

Plant material and growth conditions

Rice (O. sativa L. ssp. indica var. Pusa basmati 1) seeds were disinfected with 0.1% HgCl2 solution for 1 h and thoroughly washed with sterilized RO (reverse-osmosis) water before soaking overnight in RO water, in the dark. Seedlings were grown on cotton saturated with RO water at 28 ± 1°C, either in 14-h light and 10-h dark photoperiodic cycle or in complete darkness. Developing and mature seeds were harvested from rice plants grown in the greenhouse.

RNA isolation

Total RNA from shoot and root tissues was extracted using the RNeasy Plant Mini Kit (Qiagen, Germany) according to the manufacturer’s instructions, followed by treatment with RNase-free DNaseI (Qiagen) to remove any genomic DNA contamination. The integrity of RNA samples was monitored by electrophoresis on 1.2% agarose gel. For each RNA sample, absorption at 260 nm was measured and concentration calculated as A 260 × 40 (μg/ml) × dilution factor. To isolate RNA from mature and immature seeds, the RNA isolation method used for wheat and other carbohydrate-rich seeds was used (Singh et al. 2003), followed by treatment with DNase I to remove genomic DNA contamination.

Full-length cDNA and EST evidence search

For full-length cDNA-based expression profiling, the gene expression evidence search page (http://www.tigr.org/tdb/e2k1/osa1/locus_expression_evidence.shtml) available at TIGR rice genome annotation was used. TIGR locus IDs corresponding to the rice carotenoid biosynthesis genes were searched to find the availability of corresponding full-length cDNA sequences. For genes, where no corresponding cDNA sequence was available in the TIGR database, BLAST search was performed in NCBI database using the coding sequence as query. For EST-based expression analysis, BLAST search was performed using the cDNA sequences of rice carotenoid biosynthesis genes against the EST database (http://www.ncbi.nlm.nih.gov/dbEST/index.html) available at NCBI. MEGABLAST was used for homology search against EST databases of rice (O. sativa), wheat (Triticum aestivum), maize (Zea mays), S. bicolor, sugarcane (Saccharum officinarum) and barley (Hordeum vulgare). The searches were limited to one species at a time to detect all the existing ESTs that showed significant similarity. For each carotenoid biosynthesis gene, details of only top hits of the BLASTN search results showing score ≥500 were extracted into a separate file. An optimized value of word size of 11 and expected value of 10, without the low complexity filter, was used for analysis to record the optimum of matches between rice genes and ESTs from other monocots. For Populus carotenoid biosynthesis genes, BLAST search was performed using the partial/full-length coding sequence as query, in the EST database of Populus at NCBI (http://www.ncbi.nlm.nih.gov/dbEST/index.html) using MEGABLAST. For full-length coding sequences, results having score ≥500 were taken, whereas for partial coding sequences search results having score ≥200 were considered.

Quantitative real-time PCR analysis

Transcript levels for all rice carotenoid biosynthesis genes in different RNA samples were quantified by real-time PCR analysis using the ABI Prism 7000 Sequence Detection System and Software (PE applied Biosystems, USA) as per manufacturer’s instructions. Primers were designed from 3′UTRs or other unique regions of the genes using Primer Express 2.0 (Applied Biosystems, USA; Supplemental Table S1 in Electronic Supplementary Material). To ensure that the primers amplify a unique and specific fragment of the cDNA, each primer pair was checked by homology searches against the rice genomic sequence at the TIGR Web site. First strand cDNA was synthesized by reverse transcription of 3 μg of total RNA using High Capacity cDNA Archive kit (Applied Biosystems, USA) as per manufacturer’s instructions. These cDNA samples were mixed with 200 nM of each primer and SYBR Green PCR Master Mix (Applied Biosystems, USA) and real-time PCR analysis was carried out using ABI Prism 7000 sequence detection system and software (PE Applied Biosystems, USA). PCR reactions were performed using the following parameters: 2 min at 50°C, 10 min at 95°C, 40 cycles of 15 s at 95°C and 1 min at 60°C in the 96-well optical reaction plates (Applied Biosystems, USA). The identity of the amplicons and the specificity of the reaction were verified by agarose gel electrophoresis and melting curve analysis, respectively. The relative mRNA levels for each of the carotenoid biosynthetic gene in different RNA samples were normalized with respect to the internal standard actin, to normalize for variance in the quality of RNA and the amount of input cDNA. Real-time PCR analysis was based on three biological replicates of each sample and two technical replicates of each biological replicate. The values in the graphs are the mean of three biological replicates and the error bars show the standard deviation from mean values.

Rice microarray analysis

For microarray-based expression analysis, the data set submitted earlier by our group and available in the Gene Expression Omnibus database at the National Center for Biotechnology Information, under the series accession numbers GSE6893 and GSE6901 was used to study the expression profile of rice carotenoid biosynthesis genes in different stages of vegetative and reproductive development, and also under abiotic stress conditions. The data were imported in ArrayAssist™ (Stratagene, La Jolla, CA) microarray analysis software, normalized using GC-RMA algorithm and Log2 transformed. The average of three biological replicates was used to get each expression value. Unique Affymetrix probe-set IDs, as mentioned for each gene in TIGR, were used to extract the expression data for carotenoid biosynthesis genes. Euclidean distance metric and complete linkage rule of hierarchical clustering was used to perform cluster analysis on rows of expression values.

Rice MPSS analysis

For rice massively parallel signature sequencing (MPSS) data analysis, expression evidence for mRNA from MPSS tags was determined from rice MPSS project mapped to TIGR gene models (http://mpss.udel.edu/rice/) (Nakano et al. 2006). MPSS expression data from 22 mRNA libraries, representing 18 different tissues/organs were used in the analysis, and data for 17-base significant signatures (classes 1, 2, 5 and 7 identifying the sense strand), which uniquely identify an individual gene and show a perfect match (100% identity over 100% length of the tag), were retrieved. The normalized abundance (transcript per million, tpm) of these signatures for a given gene in any library represents a quantitative estimate of expression of that gene.

Results

Carotenoid biosynthesis genes in rice and Populus

BLAST searches of the rice genome using HMM profiles of carotenoid biosynthesis enzymes as query, and analysis of the putative genes with SMART, revealed the presence of 16 potential non-redundant carotenoid biosynthesis genes in rice. Each carotenoid biosynthesis gene was given a name based on the enzymatic reaction, as given in the Arabidopsis carotenoid biosynthesis pathway (Dellapenna and Pogson 2006), also taking into consideration their position on rice chromosomes 1–12 from top to bottom. The carotenoid biosynthesis gene name, the Institute of Genomic Research (TIGR) locus ID, gene length, open reading frame (ORF) length, protein length, chromosomal location, neighboring marker and other related information for all 16 carotenoid biosynthesis genes of rice are listed in Table 1. Similarly, genes retrieved from rice were used as a query for BLASTP search in Populus database, resulting in the identification of 34 genes corresponding to carotenoid biosynthesis pathway enzymes. The coding sequences of 24 genes of Populus carotenoid biosynthesis pathway were partial (Supplemental Table S2).

Table 1 Carotenoid biosynthesis genes in rice

Structure of carotenoid biosynthesis genes

To study the gene structure of rice carotenoid biosynthesis genes, their exon–intron organization was analyzed, revealing that only 1 of the total 16 carotenoid biosynthesis genes was intronless; among those having introns, the number of introns varied from 4 to 15 (Supplemental Fig. S1). For carotenoid biosynthesis genes of rice, Populus and Arabidopsis, the number of exons, protein length, genomic strand and chromosomal localization were studied (Supplemental Table S2). In Populus, the number of introns was found to vary from 1 to 15, and out of 34 genes, 8 were intronless.

Domain analysis

All rice, Populus and Arabidopsis proteins were analyzed for domain structures employing SMART database and compared with each other. The domains identified in different carotenoid biosynthesis enzymes in rice are depicted in Fig. 1. The possible roles and a comparison of rice and Populus carotenoid biosynthesis enzymes with that of Arabidopsis are given in Table 2. For most rice proteins, the presence of a chloroplast transit peptide was predicted by ChloroP 1.1 program.

Fig. 1
figure 1

Structure of proteins encoded by rice carotenoid biosynthesis genes. The protein structure is based on the presence of catalytic and other additional domains as identified by SMART. Different domains are highlighted in different colors

Table 2 Comparative analysis of domain organization in carotenoid biosynthesis enzymes of Arabidopsis, rice and Populus

Chromosomal distribution

To determine the chromosomal location of all the 16 carotenoid biosynthesis genes in rice, we determined the position (in base pair) of each gene on the rice chromosome pseudomolecules available at TIGR (release 5) and IRGSP (build 05). Sixteen carotenoid biosynthesis genes were found to be distributed on 10 of the 12 rice chromosomes. A diagrammatic representation of chromosomal distribution of carotenoid biosynthesis genes on the ten rice chromosomes is depicted in Fig. 2 (the exact position on rice chromosome pseudomolecules, in base pairs is given in Supplemental Table S2). Carotenoid biosynthesis genes of Populus were present on 12 of the 19 chromosomes. Thirteen genes were assigned to scaffolds that have not yet been assigned to any chromosome (Supplemental Table S2). To understand the mechanism underlying the evolution of carotenoid biosynthesis genes in rice, both tandem and segmental duplication events were studied. In this case, duplication was confined to only chromosomal block duplications, because none of them were found to be arranged in tandem. Only three carotenoid biosynthesis genes OsβOH1 (LOC_Os03g03370), OsβOH2 (LOC_Os04g48880) and OsβOH3 (LOC_Os10g38940) were found to be present on the duplicated segmental regions of rice chromosomes, mapped by TIGR (Fig. 2; Supplemental Table S3) at a maximal length distance permitted between collinear gene pairs of 100 kb.

Fig. 2
figure 2

Genomic distribution of carotenoid biosynthesis genes on rice chromosomes. Black ovals on the chromosomes (vertical bars) indicate the position of centromeres. The arrows next to gene names show the direction of transcription. The chromosome numbers are indicated at the top of each bar. The carotenoid biosynthesis genes present on duplicated chromosomal segments are connected by lines. The position of carotenoid biosynthesis gene on TIGR rice chromosome pseudomolecule (release 5) is given in Supplemental Table S2

Phylogenetic analysis

To investigate the evolutionary relationship of rice genes, separate unrooted NJ and MP trees were generated for each enzyme, by alignment of their protein sequence with the corresponding homologs in Arabidopsis and other plant species (Fig. 3; Supplemental Fig. S2; Supplemental Table S4). Carotenoid biosynthesis enzymes of Populus were not considered for phylogenetic study as quite a few of the protein sequences were not of full length. Phylogenetic analysis of PSY in rice, Arabidopsis and other monocot and dicot plant species revealed that all PSYs cluster into two separate monocot- and dicot-specific clades, where most of the members show monophyletic pattern of origin (Fig. 3a). As shown in Fig. 3b, the product of PDS, a single copy gene in rice, was grouped with maize and daffodil PDS in monocot-specific clade, with separate clustering of Arabidopsis and other dicot PDS in a dicot-specific clade. Similarly, ZDS protein sequence, encoded by a single gene in rice, also formed two distinct monocot- and dicot-specific clades; OsZDS grouped with maize gene in monocot-specific clade and Arabidopsis and other dicot ZDS were clustered in dicot-specific clade (Fig. 3c). While studying the next enzyme of the pathway, CRTISO, it was observed that maize and Oncidium CRTISO clustered with rice homolog in monocot-specific group and all other dicot-specific CRTISO formed another group (Fig. 3d). For cyclases of carotenoid biosynthesis pathway, lycopene β- and є-cyclase, two different and specific clades for β- and є-cyclases were formed, which further subgrouped into monocot- and dicot-specific clades. OsβLCY grouped separately from Arabidopsis and Citrus βLCYs, similar to OsєLCY, which also formed a separate clade with TaєLCY (Fig. 3e). In Arabidopsis, two genes coding for βOH are known; however, in rice, three genes coding for βOH enzyme were found to be present, which grouped together with maize homologs in a separate monocot-specific branch (Fig. 3f). Two Arabidopsis βOHs were found in dicot-specific clade with βOHs from Capsicum annuum and other dicots. Different homologs for єOH were also found in three separate groups: OsєOH1 and OsєOH2, grouped with Glycine max and Z. mays, respectively; however, OsєOH3 was present with Arabidopsis, Daucus and Medicago homologs (Fig. 3g). On aligning rice ZE with other protein sequences, two separate dicot- and monocot-specific clades were formed where OsZE was clustered with ZmZE (Fig. 3h). Similarly, alignment of VDE also formed distinct dicot- and monocot-specific clades, with grouping of OsVDE with ZmVDE (Fig. 3i). For all the enzymes, the MP phylogentic trees generated by MEGA program also showed distinct clustering of monocot- and dicot-specific genes (Supplemental Fig. S2), as discussed above.

Fig. 3
figure 3

Phylogenetic relationship of rice genes with Arabidopsis and other carotenoid biosynthesis genes. The unrooted tree was generated using ClustalX program by neighbor-joining method. Bootstrap values from 1,000 replicates are indicated at each node. Monocot and dicot-specific clades are indicated in different colors

In this study, we have reported the occurrence of three and four βOHs in rice and Populus, respectively. A comparison of the genomic structures in Fig. 4 showed that both Arabidopsis genes, AtβOH1 and AtβOH2, have seven exons and their exon sizes are very similar. The structure and exon sizes of AtβOH2 are comparable with OsβOH3, where exons 2 (64 bp), 3 (72 bp) and 4 (138 bp) of Arabidopsis gene appear to have fused to form the second exon (276 bp) of rice gene. Accordingly, the length of 5′ and 3′UTRs of AtβOH2 and OsβOH3 are also quite comparable. In segmentally duplicated rice β-carotene hydroxylases, OsβOH1 and OsβOH3, intron–exon structure and exon lengths are also very similar.

Fig. 4
figure 4

Gene structure of Arabidopsis and rice βOHs. Exon–intron structures were calculated by comparing transcript sequences with genomic regions of the corresponding genes. The following full-length cDNAs were used: AtβOH1 (AT4G25700.1), AtβOH2 (AT5G52570.1), OsβOH1 (AK287823), OsβOH2 (AK060559), and OsβOH3 (CT837931). Numbers within the boxes indicate the number of corresponding exon, numbers above exons represent the exon size (in base pairs) and numbers below the shaded boxes represent the length of 5′ and 3′UTRs (in base pairs)

Full-length cDNA- and EST-based expression analysis

One or more full-length cDNAs were found to be available for 13 of 16 rice carotenoid biosynthesis genes in the TIGR database, and for three genes the full-length/partial cDNA sequences were present in the NCBI database (Table 1). In the NCBI database, we identified ESTs for all the rice genes, but the frequency of ESTs for individual genes varied greatly, ranging from 3 ESTs for OsCRTISO to 25 ESTs for OsPSY1 (Supplemental Table S5). An analysis of EST sequences in the Gene Bank database revealed the presence of a large number of ESTs, having high sequence similarity with rice carotenoid biosynthesis genes in other monocots also (Supplemental Table S5). In Populus, most of the predicted genes have corresponding EST sequences in the database (Supplemental Table S6).

Quantitative real-time PCR-based expression analysis of rice carotenoid biosynthesis genes

To further study the organ-specific and light-regulated expression of rice carotenoid biosynthesis genes, real-time PCR was performed with total RNA isolated from dark-grown roots, light-grown roots, dark-grown shoots, light-grown shoots, and developing and mature seeds (Fig. 5). OsPSY1 and OsPSY2 have higher expression in light-grown tissues as compared to the dark-grown tissues and show almost seven to eightfold and four to fivefold higher expression, respectively, in light-grown shoot as compared to mature seed. In contrast, OsPSY3 has almost similar expression level in light- and dark-grown root and shoot tissues. OsPDS, OsZDS and OsCRTISO have the highest expression in light-grown shoot, followed by developing and mature seed. OsβCY and OsєLCY also have higher expression in light-grown shoot, followed by seed and light-grown root. All the three βOHs show high level of expression in light-grown shoot and root, and in developing and mature seeds, whereas the expression is low in dark-grown tissues (Fig. 5). The three β-hydroxylase genes, OsβOH1, OsβOH2 and OsβOH3, have maximum expression in developing seed, light-grown shoot and light-grown shoot, respectively. The expression pattern of three єOH encoding genes of the pathway, OsєOH1, OsєOH2 and OsєOH3, is quite similar where light-grown shoot has slightly higher expression, followed by developing seed, mature seed, light-grown root and dark-grown shoot. OsZE has the highest expression in mature and developing seed, followed by light-grown shoot, light-grown root and dark-grown shoot (Fig. 5). The expression of OsVDE is highest in light-grown shoot, followed by developing and mature seed, light-grown root and dark-grown shoot.

Fig. 5
figure 5

Real-time PCR expression profile of rice carotenoid biosynthesis genes. For individual genes, the relative mRNA levels were normalized with respect to housekeeping gene Actin, in different tissues (DR dark-grown roots, LR light-grown roots, DS dark-grown shoots, LS light-grown shoots, DVS developing seed, MS mature seed)

Microarray-based expression analysis of rice carotenoid biosynthesis genes

To study the transcript accumulation of rice carotenoid biosynthesis genes in different stages of vegetative and reproductive development, data set available in the public domain under the series accession number GSE6892 was utilized. On the basis of expression in different tissues/organs and developmental stages, a cluster dendrogram was made and all rice carotenoid biosynthesis genes were grouped into two main clades: the first group of genes had very low to moderate level of expression, and the second group had moderate to high level of expression in all vegetative and reproductive stages studied (Fig. 6a). In group 1 genes, all except OsVDE had almost negligible expression in late seed developmental stages (S3–S5) and variable expression pattern in other vegetative and reproductive stages; however, most of the genes had comparatively higher expression in photosynthetic tissues (seedling, mature leaf and Y leaf). In the second group, some genes were highly expressed in all the tissues/stages studied. OsPSY1 and OsPSY3 were highly expressed in photosynthetic tissues (seedling, mature leaf and Y leaf) and during panicle development (P1-III, P1-6), and showed a decrease in transcript abundance as seed maturation progressed (S3–S5). OsβOH1 was also highly expressed in most of the stages and showed a decline in transcript level during seed maturation (S5); however, OsPDS and OsZDS were highly expressed in all the tissues and developmental stages analyzed, including late stages of seed development. The expression level of OsєOH1 and OsCRTISO was reduced during late stages of panicle (P5–P6) and seed development (S1–S5), as compared to mature and Y leaf. OsβOH3 also showed gradual reduction in expression during seed development (S2–S5), while the transcript level was highest during panicle (P2–P6) and early stage of seed development (S1) and not in photosynthetic tissues (seedling, mature leaf and Y leaf). OsPSY2, OsєLCY and OsєOH3 showed highest expression in photosynthetic tissues (seedling and mature leaf), whereas the expression was highly reduced during late seed developmental stages (S3–S5).

Fig. 6
figure 6

Expression profiles of rice carotenoid biosynthesis genes based on rice microarray and mRNA MPSS data. Microarray-based expression profile of rice carotenoid biosynthesis genes a during different vegetative and reproductive developmental stages, and b under abiotic stress conditions. The average log signal values of rice carotenoid biosynthesis genes in various tissues/organs/developmental stages (7-day-old light-grown seedling, roots from 7-day-old light-grown seedlings, mature leaf, Y leaf, different stages of panicle: up to 0.5 mm, SAM; 0.5–2.0 mm, P1-I; 2.1–5.0 mm, P1-II; 5.1–10.0 mm, P1-III; 0–3 cm, P1; 3–5 cm, P2; 5–10 cm, P3; 10–15 cm, P4; 15–22 cm, P5; 22–30 cm, P6, and seed: 0–2 dap, S1; 3–4 dap, S2; 5–10 dap, S3; 11–20 dap, S4; 21–29 dap, S5), and abiotic stress treatments (control; 7-day-old light-grown seedling, salt; 200 mM NaCl solution for 3 h, desiccation; dried for 3 h between folds of tissue paper at 28 ± 1°C, cold; 4 ± 1°C for 3 h) are presented by cluster display. The color scale (representing log signal values) is displayed at the bottom. c Rice mRNA MPSS data-based expression profile of rice carotenoid biosynthesis genes. The number of significant signatures are shown as transcript per million (tpm). The color scale (representing tpm) is shown at the bottom. The description of the mRNA libraries is: NCA 35-day-old callus, NGD 10 days germinating seedling grown in dark, NGS 3 days germinating seed, NIP 90 days immature panicle, NML 60 days mature leaf (representing an average of four replicates; A, B, C and D), NME 60 days meristematic tissue, NOS ovary and mature stigma, NPO mature pollen, NMR 60 days mature root (representing an average of two replicates; A and B), NST 60 days stem, NYL 14 days young leaves, NYR 14 days young roots, NSL 14 days young leaves stressed in 250 mM NaCl for 24 h, NSR 14 days young roots stressed in 250 mM NaCl for 24 h, NDL 14 days young leaves stressed in drought for 5 days, NDR 14 days young roots stressed in drought for 5 days, NCL 14 days young leaves stressed at 4°C for 24 h, NCR 14 days young roots stressed at 4°C for 24 h. On the left side of all three expression maps cluster dendrogram is shown

To study the effect of abiotic stress on transcript profile of carotenoid biosynthesis genes in rice seedlings, expression analysis was performed under three conditions, viz. dehydration, cold and salt, using the data set available in the public domain under the series accession number GSE6901. Figure 6b clearly shows that the conditions of abiotic stress have almost no effect on the expression of OsPSY1, OsPSY2, OsPDS, OsZDS, OsCRTISO, OsβLCY, OsєOH1 and OsZE. Expression of some genes, i.e., OsVDE, OsβOH2, OsєLCY, OsєOH3 and OsєOH2 is slightly reduced with respect to the control; however, transcript level of three carotenoid biosynthesis genes, OsPSY3, OsβOH1 and OsβOH3, was found to be highly increased as compared to control. Expression of OsβOH1 and OsβOH3 was specifically increased in rice seedlings exposed to dehydration and cold stress. A comparison of expression profiles for duplicated carotenoid biosynthesis genes revealed that for OsPSY and OsβOH, expression of the duplicated gene copy has significant variation (Fig. 7). In case of duplicated β-carotene hydroxylases, transcript level of OsβOH2 and OsβOH3 is quite low, but they reveal a similar kind of expression pattern; however, expression pattern as well as transcript level of OsβOH1 is strikingly different.

Fig. 7
figure 7

Expression profile of duplicated carotenoid biosynthesis genes in rice genome. X-axis represents the developmental stages, Y-axis represents the raw expression values obtained using microarrays

The strong induction of βOH encoding gene by abiotic stresses prompted us to check their promoter sequence (1 kb upstream the transcript start site) by searching against the plant CARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/). The promoter sequence of OsβOH1 has cis-acting elements involved in abscisic acid and salicylic acid responsiveness (ABRE; 5 hits, TCA; 1 hit). The promoter sequence of OsβOH3 also shows the presence of putative stress-response related cis-acting elements involved in abscisic acid responsiveness (ABRE; 5 hits, motif IIb; 1 hit) and low temperature responsiveness (LTR; 1 hit).

MPSS-based expression analysis of rice carotenoid biosynthesis genes

Expression analysis of rice carotenoid biosynthesis genes using MPSS data for 17-base signatures from 22 mRNA libraries showed that all the genes had MPSS signature in at least one of the libraries, indicating the presence of their transcript. On the basis of expression in different libraries, a cluster dendrogram was made where all rice carotenoid biosynthesis genes were grouped into two main clades. The first group has three genes, OsPSY3, OsβOH1 and OsZE, having high transcript abundance in leaves under salt or cold stress conditions, and the second group includes rest 13 genes having low to moderate level of expression in all vegetative and reproductive stages studied. However, the conditions of abiotic stress were found to have almost negligible effect on the expression of these genes (Fig. 6c).

Discussion

During an extensive search of the rice genome and domain analysis of the putative candidate genes, carotenoid biosynthesis genes in rice were found to have 16 potential non-redundant members. A similar bioinformatics-based approach revealed the presence of 34 genes coding for carotenoid biosynthesis enzymes in Populus. However, in Arabidopsis, 11 genes coding for carotenoid biosynthesis pathway enzymes have been characterized (DellaPenna and Pogson 2006). In Populus, such a high number of genes could be due to incomplete finishing of chromosomal BACs/PACs or alleles from unassembled haplotypes, which are potential artifacts from shotgun assembly of this highly heterozygous genome. However, whole-genome (salicoid) duplication event accounting for the presence of paralogous sequences and recent evolution of this genus (Tuskan et al. 2006) might have also contributed to the evolution of carotenoid biosynthesis genes. Post-duplication, the gene copies could have diverged functionally and acquired specific functions, as evidenced by significant deviation in the frequency of ESTs (Supplemental Table S6; Segerman et al. 2007).

The relationship among different members of a gene family can be analyzed by gene structure, i.e., the number and distribution of introns (Li et al. 2006). A cross comparison among rice, Populus and Arabidopsis genes revealed more or less similar number of exons in homologous genes, although the length of introns was quite variable (Supplemental Table S2). Interestingly, all the seven gene copies coding for βLCY in Populus were found to be intronless, similar to their Arabidopsis and rice counterparts. These results indicate that in the common ancestor of each gene there would be an ancestral exon–intron structure (or intronless), and after the divergence of genes (e.g., by speciation), some genes changed and others retained the original pattern. The enzymes involved in carotenoid biosynthesis were found to possess specific domains depending on the type of reaction they catalyze in the pathway (Fig. 1). All the homologous genes in rice, Populus and Arabidopsis, were found to have a highly conserved catalytic domain (Table 2); some additional domains (from the same superfamily) were also present in rice and/or Populus homologs. Overlapping domains, involved in catalyzing similar kind of reactions, and domains unrelated to carotenoid biosynthesis, which can have possible roles in diverse metabolic and regulatory activities, were also present. While studying the chromosomal location of rice carotenoid biosynthesis genes, their distribution pattern revealed no evident clustering, and chromosomal block duplications were found to be responsible for the evolution of carotenoid biosynthesis genes in rice.

While studying the evolutionary relationship of rice carotenoid biosynthesis genes with Arabidopsis and other plant species, most of the rice enzymes grouped with other monocot homologs with the support of good bootstrap scores (Fig. 3; Supplemental Fig. S2). Duplication of PSY, catalyzing the first step of the pathway, was expected to have taken place only before the members of the grass family diverged, since three PSY copies were conserved in the Poaceae family (Li et al. 2008a). Welsch et al. (2008) have shown that OsPSY1 is the ancestral gene, as it has the highest structural resemblance with AtPSY, and OsPSY2 and OsPSY3 is the sister pair resulting from the second duplication event. This phylogenetic relationship is clearly evident in Fig. 3a, as three distinct subclades with a good bootstrap support are formed within the monocot-specific clade. The PSYs of rice, maize and Sorghum are not clustered with tomato PSYs, which indicates that the duplication of PSY in the Solanaceae and Poaceae family has taken place independently. OsPDS, OsZDS, OsCRTISO, OsβLCY and OsєLCY were also found to be grouped in separate monocot-specific clades suggesting a close evolutionary relationship among monocot homologs (Fig. 3b–e). The three βOHs of rice were also present in a separate monocot-specific clade with further subgrouping into smaller clades, which indicates that they are proteins encoded by homologous genes (Fig. 3f). It is not clear whether the duplication of βOHs is unique for rice or they have been duplicated in the whole Poaceae family (like PSY duplication) or in a large group of monocots. MP phylogenetic trees, generated by MEGA program, also illustrated a similar pattern of grouping where rice genes were close to monocot homologs with good bootstrap values (Supplemental Fig. S2).

Lineage-specific expansion of gene families has played an important role in the evolution of bacterial, archaeal and eukaryotic genomes (Jordan et al. 2001; Lespinet et al. 2002). It has been observed that in rice, some carotenoid biosynthesis genes are present as multiple copy genes, while in Arabidopsis the copy number is less. It appears that the higher number of PSY copies provides an ability to fine tune and control the flux in carotenoid pathway where each homolog has a different tissue-specific expression, plays a different role in plant growth and development, and is differentially regulated. In tomato, two PSY are expressed in different types of plastids (Fraser et al. 1999); rice and maize phytoene synthase have different tissue-specific expression, play different roles and are differentially regulated by light or abiotic stress (Welsch et al. 2008; Li et al. 2008b). Another enzyme of the pathway, encoded by multiple gene copies in rice and Populus, is βOH, which is a non-heme diiron monooxygenase and catalyzes β-hydroxylation of carotene. In Arabidopsis, two βOH closely related at protein levels are present. These genes have different expression levels, as they are not entirely equivalent and likely to have specialized functions (Tian et al. 2004). Similarly, two closely related, but functionally distinct βOHs, are maintained in tomato and pepper (Dellapenna and Pogson 2006). During the course of evolution, the process of gene duplication and sequence divergence might have resulted in homologs having slightly different structure and new properties. These small changes in the sequences may lead to generation of new domains and altered protein functions, thus playing an important role in evolution, since additional domains are known to modulate the function of the conserved catalytic domain (Bashton and Chothia 2007).

Another hydroxylase of the pathway, єOH, which belongs to CYP97 family of P450 enzymes and catalyzes є-hydroxylation of carotene, is encoded by a single copy gene in Arabidopsis (Tian et al. 2004). In rice, three related genes encoding for CYP97 family of P450 enzymes (Clans A, B and C) were found to be present in the genome (Fig. 3g). In an earlier study, the substrate specificity and activity of two of rice P450 CYP97 enzymes (Clans A and C) was studied by functional complementation in E. coli and revealed that Clan A enzyme, CYP97A4, is a βOH with only minor є-hydroxylation activity; however, Clan C enzyme CYP97C2 is an єOH (Quinlan et al. 2007). Till now, there is no functionally demonstrated carotene hydroxylase representative for members of CYP97 Clan B, which is expected to be involved in the synthesis of some common plant metabolite (Quinlan et al. 2007). We have considered CYP97 Clan B enzyme for our analysis, as further studies regarding functional validation are needed to confirm the absence of carotene hydroxylation activity. Although these two types of carotene hydroxylases use different mechanisms and have evolved independently, still β-ring hydroxylation can be performed either by non-heme diiron β-ring hydroxylases or CYP97A4 (Tian et al. 2004), and βOH can also show some activity toward є-rings (Sun et al. 1996). It appears to be a case of convergent evolution, where enzymes having different protein structure are observed to have plasticity in their function.

To analyze the expression profile of these genes, the frequency of cDNAs or ESTs present in different databases can serve as an important resource (Adams et al. 1995). For all carotenoid biosynthesis genes of rice and Populus, one or more full-length cDNAs/ESTs were found to be present in the TIGR/NCBI database (Table 1; Supplemental Table S5; Supplemental Table S6), which confirm their existence and expression. ESTs, showing significant homology with rice carotenoid biosynthesis genes, were also found in other monocots, indicating existence of their homologs in other species (Supplemental Table S5).

Given the role of carotenoids in photoprotection and photomorphogenesis (Dellapenna and Pogson 2006), their biosynthesis genes are expected to be highly expressed in light-grown tissues. Since the first gene of carotenoid biosynthesis pathway, PSY is known to be light-regulated (von Lintig et al. 1997; Welsch et al. 2000), expression of other genes was also studied in light/dark-grown tissues, revealing differential expression of these genes. Higher expression of OsPSY1 and OsPSY3 in light-grown tissues and similar transcript level of OsPSY2 under light/dark conditions (Fig. 5) are in accordance with the earlier published literature where OsPSY3, unlike its paralogs, was not found to be light regulated (Welsch et al. 2008). Expression of OsPDS, OsZDS and OsCRTISO was also found to be highest in light-grown shoot, suggesting control of expression by light. OsPDS can play an important regulatory role in rice carotenoid biosynthesis, as regulation of PDS level is shown to be an important component in the regulation of ABA level during maize seed development (Hable et al. 1998). Transcript levels of OsβCY and OsєLCY were also found to be higher in light-grown shoot, suggesting their possible regulation by light. In maize, variation at єLCY locus affected the flux down α- or β-carotene branch of the pathway, thereby creating natural variation for carotenoid composition and vitamin A precursors (Harjes et al. 2008). Three β- and εOHs of rice were expressed in almost all tissues, with low transcript level in dark-grown tissues. Similar to the observed expression pattern, two βOHs in Arabidopsis were expressed in all tissues, and in leaves their expression was induced by excess light (Tian and Dellapenna 2001; Rossel et al. 2002). OsZE showed the highest transcript level in seed tissue similar to Arabidopsis, where expression of ZE was shown to be enhanced during the maturation phase of seed development, when ABA accumulation was maximal (Audran et al. 2001). Increased expression of OsZE in seed, as compared to other tissues, can be related to ABA accumulation as it is the last step of carotenoid biosynthesis pathway and provides direct supply of precursor for ABA synthesis. OsVDE had the highest transcript level in light-grown shoot followed by seed tissue. In higher plants, VDE forms part of the xanthophyll cycle (Niyogi 1999), thus explaining higher expression of OsVDE in light-grown shoot. In Citrus fruits, transcript level for many carotenoid biosynthesis genes were found to increase during fruit ripening and maturation, which led to a change from β, є-carotenoids to β, β-carotenoids resulting in a shift in peel color (Kato et al. 2004). Similarly, enhanced expression of carotenoid biosynthesis genes in seed tissue can account for increased ABA levels in seed, as ABA is required for seed maturation and dormancy and carotenoid biosynthesis pathway provides precursors for ABA biosynthesis (Fang and Chu 2008).

Earlier studies have shown that carotenoids or carotenoid intermediates are abundant in photosynthetic tissues, but absent in rice endosperm (Burkhardt et al. 1997). Gallagher et al. (2004) demonstrated that both OsPSY1 and OsPSY2 are transcribed and encode for functional enzymes in the endosperm; however, endosperm carotenoid levels correlate only with OsPSY1 levels. Thus, transcript accumulation and possible roles of other genes in rice carotenogenesis were analyzed revealing that almost all genes were expressed in green photosynthetic tissues and at different stages of panicle/seed development, although there was variation in transcript abundance (Fig. 6a). However, localization of encoded enzymes in different cell compartments can also affect their function in vivo, as earlier studies have shown the existence of carotenoid biosynthesis in different cell compartments: chloroplast in leaf and amyloplast in seed, both having a different plastid and enzyme structure (Gallagher et al. 2004). Even in chloroplast, envelope and thylakoid membrane fractions have different carotenoid profiles (Jeffery et al. 1974; Siefermann-Harms et al. 1978). The observed accumulation of carotenoids in any tissue is the net result of its biosynthesis, turnover and stable storage. Earlier reports have shown that carotenoids are synthesized in nearly all types of plastids, but the highest accumulation is observed in chloroplasts and chromoplasts (Howitt and Pogson 2006). In chromoplasts, carotenoid-lipoprotein sequestering substructures were found to be present for effective sequestration and retaining of large amounts of carotenoids (Vishnevetsky et al. 1999). In carrot and cauliflower or mutants, high levels of carotenoid accumulation were found to associate with large amounts of carotenoid sequestering structures (Li et al. 2001; Lopez et al. 2008). These observations suggest that carotenoid sequestering structures may prevent the carotenoid end products from overloading plastid membranes, the site of carotenoid biosynthesis, and avoid a negative feedback of the biosynthesis pathway by the end products. Since lipids are not known to be present in starchy endosperm of rice, it can result in the absence of carotenoid sequestering structures in elaioplasts and consequently carotenoid-free rice endosperm.

The transcript profile of duplicated genes can indicate whether they are homologs created by gene duplication and perform different functions (neofunctionalization), or together perform part of the original function (subfunctionalization; Shan et al. 2007). The three PSYs in rice are already known to differ from each other with respect to tissue-specific expression and regulation (Welsch et al. 2008), which is also evident in the present expression analysis (Fig. 7a). Similarly, in case of beta-carotene hydroxylases, OsβOH2 and OsβOH3 have a similar and low level of expression, whereas expression of OsβOH1 is quite high and shows a lot of variability (Fig. 7b).

In a natural environment, plants frequently encounter adverse growth conditions and thereby develop various adaptive mechanisms to deal with different kinds of abiotic stresses, and the one mediated via phytohormone abscisic acid is widely studied (Tuteja 2007). Under abiotic stress conditions, production of ABA is increased, which in turn promotes the expression of many genes involved in stress signaling and finally enables the plant to tolerate and survive stress by maintaining homeostasis. Carotenoid biosynthesis is known to provide precursors (9-cis-epoxycarotenoids) for ABA biosynthesis and may be a potentially limiting factor affecting flux of these compounds to this pathway (Nambara and Marion-Poll 2005). This observation is consistent with an earlier report by Welsch et al. (2008), where they have discussed increased expression of OsPSY3 in roots, after salt treatment and drought. In the present study, however, transcript abundance of OsPSY3 in seedlings increased significantly under dehydration and cold stress, whereas salt stress caused only a slight increase in its expression. Increased expression of carotenoid biosynthesis genes during stress conditions provides the plant an ability to combat deleterious stress effects; however, activity of their biosynthesis enzymes might represent a bottleneck for increased production. Probably to overcome this limitation, their expression is up-regulated by abiotic stress conditions. Increased transcript abundance of OsPSY3, OsβOH1 and OsβOH3 during dehydration and cold stress could also be involved in providing increased supply of precursors for stress-induced ABA accumulation, which may be translocated to other plant tissues having limiting precursors for ABA biosynthesis. During osmotic stress conditions, ABA levels were found to increase in maize roots and a part of this ABA was translocated to other plant tissues (Rivier et al. 1983). In tomato leaves, the expression of ABA biosynthesis gene (NCED) is known to increase under abiotic stress conditions (Thompson et al. 2000a); however, the expression of upstream carotenoid biosynthesis enzymes is not affected since the precursors (epoxycarotenoids) are abundantly present (Thompson et al. 2000b). Unlike leaves, drought stress-induced accumulation of ABA in tomato roots was associated with increased transcript levels of NCED, as well as ZE and βOH (Thompson et al. 2000b, 2007), which shows that ABA precursors are limited in roots. The presence of ABRE (ABA-response elements) in the promoter region of OsβOH1 and OsβOH3 indicates their possible activation by ABA. Initially, expression of these genes is increased in response to abiotic stress, which provides more carotenoid precursors for ABA biosynthesis. The increased amount of ABA would possibly increase the expression of these genes further, resulting in ABA-enhanced ABA biosynthesis. The presence of SA-response elements was also observed in the promoter sequence of OsβOH1. In Arabidopsis, salicylic acid is also known to play a role in plant response to adverse environmental conditions, such as it potentiates the abiotic stress response during salt and osmotic stress conditions by increasing reactive oxygen species generation in the photosynthetic tissue (Borsani et al. 2001).

For obtaining a comprehensive coverage of gene expression, measurements by using a combination of transcription profiling technologies is recommended (Oudes et al. 2005). Further quantitative assessment of transcript abundance by a different transcription profiling technique, MPSS, a digital method for analyzing gene expression, was utilized as it generates 100s of 1,000s of molecules per reaction, thereby facilitating detection of significant differential expression even for genes having very low expression levels (Nobuta et al. 2007). Significant signatures (which uniquely identify any individual gene) were found to be available for all the genes in at least 1 of the 22 libraries (Fig. 6c). MPSS data from abiotic stress treatments showed that the transcript level of OsPSY3 and OsβOH1 increased in cold-stressed young leaves (NCL), as also observed in microarray-based expression analysis (Fig. 6b); however, contrary to microarray analysis, the expression of these genes was unaffected in drought-stressed leaves (NDL). Similarly, OsβOH3 also did not show any change in transcript abundance while analyzing MPSS data from cold-stressed (NCL) or drought-stressed (NDL) leaves. During microarray-based expression analysis, transcript level of OsZE was found to be unaffected by stress conditions, whereas MPSS expression data showed enhanced expression of this gene in leaves under salt and cold stress conditions. Some earlier studies have also shown that microarray and MPSS technologies did not correlate well on a quantitative basis for transcript abundance measurements, thereby suggesting the MPSS platform to be more variable than microarrays for RNA abundance measurement (Chen et al. 2007). Moreover, data from two microarray technologies (Affymetrix and Agilent) were found to be more consistent with each other than with MPSS (Chen et al. 2007).

To evaluate the utility of carotenoid biosynthesis genes for functional analysis, we gathered additional information regarding the phenotype of insertion lines of rice Tos17 retrotransposon insertion mutants (Miyao et al. 2003) using the BLAST program with the aid of Tos17 mutant panel database (http://tos.nias.offrc.go.jp/). A total of 13 insertion mutants, corresponding to five genes of the carotenoid biosynthesis pathway, revealed variant phenotypes from being super dwarf to viviparous (Supplemental Table S7), suggesting an important role of these compounds in regulating plant growth, development and different metabolic processes by synthesizing antioxidants and signaling molecules.

In conclusion, using an in silico approach, 16 and 34 genes involved in carotenoid biosynthesis were found to be present in rice and Populus genome, respectively. Sequence comparison with Arabidopsis and other known carotenoid biosynthesis genes showed that the encoded proteins had a highly conserved structure. Genomic organization and phylogenetic analysis revealed the potential mechanisms responsible for the evolution of genes within and among species. Full-length cDNA- and EST-based expression analysis showed that all the rice and Populus carotenoid biosynthesis genes were expressed. Most of the rice genes appear to be light-regulated, a majority of genes are differentially expressed in different tissues/organs, vegetative and reproductive development stages, and expression of some genes is enhanced under abiotic stress conditions. During rice seed development, almost all the genes are expressed in the mature seed stage which indicates that some other regulatory mechanism (at post-transcriptional/translational level or during localization in different subcellular compartments) affects enzyme synthesis/activity resulting in a carotenoid-free endosperm.