Introduction

Sucrose and its components, glucose and fructose, play major roles in the transport, storage, and metabolism of carbon in plants (Farrar, 1996; Xu et al., 1996; Avigad and Dey, 1997; Rook et al., 1998; Williams et al., 1992). They are also important in osmotic adjustment, stress tolerance, reproduction, and signaling (Levitt 1980; Smeekens 2000; Gibson 2000; Iraqi and Tremblay 2001). Sucrose phosphate synthase (SPS; EC 2.1.4.14) and sucrose phosphate phosphatase (SPP; EC 3.1.3.22) are responsible for irreversible sucrose synthesis from UDP-glucose and fructose-6-phosphate, and invertases (EC 3.2.1.26) are responsible for irreversible sucrose hydrolysis. Sucrose synthase (SuSy, EC 2.4.1.13) catalyzes the reversible conversion of sucrose and UDP or ADP to UDP- or ADP-glucose and fructose (Baroja-Fernandez et al., 2003). Small gene families encode SPS, SPP, and SuSy in Arabidopsis thaliana, but two larger families encode the acid invertases of the cell wall and vacuole (Haouazine-Takvorian et al. 1997; Sherson et al. 2003) and the neutral/alkaline invertases of the cytosol (Vargas et al. 2003).

The existence of two gene families encoding invertases reflects the postulated origin of green algae and higher plants through an endosymbiotic event, in which a cyanobacterial endosymbiont invaded a non-photosynthetic, respiratory eukaryote (Margulis and Sagan 2003). The plant neutral/alkaline invertases are closely related to the cyanobacterial invertases (Vargas et al. 2003), while the plant acid invertases have a clear affinity with the invertases of respiratory eukaryotes such as yeasts and aerobic bacteria as Bacillus spp. (Sturm and Chrispeels 1990). The neutral/alkaline invertases are believed to be cytosolic in plants, algae, and cyanobacteria, which are the only organisms capable of synthesizing sucrose (Chen and Black 1992). The acid invertases (or fructosidases) of bacteria are periplasmic, where they hydrolyze extracellular sucrose and other fructose-containing oligo- and polysaccharides, such as fructans (Reddy and Maley 1996; Ehrmann et al. 2003; Warchol et al. 2002). The acid invertases in fungi are largely extracellular in location but Suc2 of yeast is transcribed from two different promoters that result in an extracellular form and a cytoplasmic form, with the latter lacking the N-terminal signal peptide that directs the former to the periplasm (Perlman et al. 1982). Plants contain not only extracellular acid invertases but also vacuolar forms that have evolved from them (Unger et al. 1994; Goetz and Roitsch 1999). It is thought that the vacuolar forms evolved from the cell wall forms, presumably through mutations that changed their subcellular targeting within the endoplasmic reticulum system. Nascent proteins on ribosomes are targeted to the endoplasmic reticulum by a hydrophobic N-terminal signal peptide of about 20 amino acids (Nielsen et al. 1997). The later evolution of specific vacuolar-targeting signals may have enhanced the efficiency of discrimination between subcellular locations (Tague et al. 1990). Many plants have the capacity to synthesize fructans from sucrose (Vijn and Smeekens 1999). Enzymes of fructan synthesis, such as sucrose:sucrose fructosyl transferase, are related to vacuolar acid invertases, while enzymes of fructan breakdown, such as fructan 1-exohydrolase, are related to cell-wall acid invertases (Van den Ende et al. 2001).

The sequencing of the genomes of thale cress (Arabidopsis thaliana L.) and rice (Oryza sativa L.) provides an opportunity to compare the acid and neutral/alkaline gene families across the dicot–monocot divide. Neither of these species accumulates fructans. Six cell-wall invertase genes (AtcwInv16) and two vacuolar invertase genes (AtvaInv12) have been reported for Arabidopsis and all are transcribed (Sherson et al., 2003). Eleven neutral/alkaline invertase genes (At-A/N-InvA-K) have been reported for Arabidopsis, but two genes (At-A/N-InvB,J) are cDNA clones of two of the other genes and are therefore redundant in this context (Vargas et al. 2003). Here we report the identification of rice genes potentially encoding eight neutral/alkaline invertases, nine cell-wall invertases, and two vacuolar invertases. We suggest a nomenclature for the rice invertase genes based on their inferred evolutionary relationships and draw conclusions about the mechanisms and rates of evolutionary changes. Gene-specific primers for reverse transcription–polymerase chain reaction (RT-PCR) identify transcripts for 18 of the 19 rice invertase genes and provide preliminary data on gene expression across the major organs of the rice plant and across the period of panicle development. The complete cataloguing of the invertase gene families is described here and some unexpected predictions are suggested on the subcellular locations of the invertase enzymes.

Materials and Methods

Plants

Rice (Oryza sativa L. cv. IR64) seeds were obtained from the Genetic Resources Center at IRRI. They were grown in flat nursery trays for 21 days and then transplanted into pots of soil (15 kg) containing ammonium sulfate (3 g N), sodium dihydrogen phosphate (1.5 g P), and KCl (3 g K). Plants were grown until maturity under glasshouse conditions without supplemental lighting or climate control except for manual control of sky lights and fans. Pots were watered twice daily to maintain flooding of the soil. Tissue samples were harvested directly into liquid nitrogen. Calli were induced from sterilized mature seeds of IR64 and maintained in Murashige and Skoog (1962) medium supplemented with maltose (30 g/L), 2,4-dichlorophenoxyacetic acid (2 mg/L), and agarose (8 g/L) in the dark at 27°C for 3–4 weeks.

BLAST Searches and DNA Annotation

Several known acid and neutral/alkaline invertase sequences (Table 1) were obtained from the NCBI database (http://www.ncbi.nhn.nih.gov) and used in BLASTn and tBLASTn searches (http://www.ncbi.nhn.nih.gov/blast/) to identify the full set of invertase genes in the genomes of the japonica and indica subspecies of rice. Matches achieved similarity scores of >50.0 and probability scores of <10−4. The japonica genome sequence was located at NCBI (Oryza sativa/nr and Oryza sativa/htgs). The indica genome sequence was located at the Chinese Rice Genome Database (http://rise.genomics.org.cn/rice2/index.jsp). Wherever possible, we checked the published annotations of the japonica genomic clones against full-length cDNA clones (Kikuchi et al. 2003) and expressed sequence tags (http://www.tigr.Org/tdb/lgi/ and NCBI/Oryza sativa/est-others). We also checked the predicted amino acid sequences against the conserved motifs of the acid invertases (Fig. 1) and neutral/alkaline invertases (Fig. 2). For genomic hits that had not previously been annotated, we supplemented the above methods with the use of Genscan (http://genes.mit.edu/GENSCAN.html) (Burge and Karlin 1997) and FGENESH (http://www.softberry.ru/berry.phtml). The location of the invertase genes on the rice genetic map was determined from the physical map of the rice genome (http://www.tigr.org/tdb/e2kl/osal/BACmapping/description.shtml. Invertase protein sequences were analyzed using tools available at http://us.expasy.org/tools/, including InterProtScan, MitoProt II, PredictProtein, PSORT, SignalP, and TargetP.

Table 1 Nomenclature and map location of 19 rice invertase genes predicted by BLAST
Figure 1
figure 1

Alignment of 13 well-conserved regions from known acid invertases of selected green plants. The conserved regions of the 11 rice acid invertases detected by BLAST search are also shown. The five boxed amino acids are consistently different between cell-wall and vacuolar invertases. Arrows show the four residues (Asp140, Asp265, Glu321, and Cys322) that correspond to the enzyme active site residues proposed by Alberto et al. (2004). Five of the 13 conserved motifs (3, 4, 5, 10, and 11) are located in, or overlap with, the main β structures of the five-bladed propeller structure of this GH32 class of enzyme. The numbers above the alignment represent the amino acid sequence of OsVIN2. See text for explanation of the three underlined amino acids.

Figure 2
figure 2

Alignment of 12 well-conserved regions from known neutral/alkaline invertases of selected green plants. The conserved regions of the eight rice neutral/alkaline invertases detected by BLAST search are also shown. The 10 boxed amino acids are consistently different between α group and β group invertases. Arrows show the amino acids most likely to correspond to active residues for neutral/alkaline invertases (Asp262 and Asp315), assuming equivalence with the catalytic residues of unsaturated glucuronyl hydrolase (UGL) from Bacillus sp. (Itoh et al. 2004). Numbers above the alignment represent the amino acid sequence of OsNIN2.

Construction of Unbranched Phylogenetic Trees

The deduced amino acid sequences were sent to CLUSTALW (http://clustalw.genome.jp) from which results were exported to the TreeView program (Page 1996) for construction of a phylogenetic tree. The PAUP program (Swofford 2002) was used for bootstrap analysis.

RNA Extraction and RT-PCR

Total RNA was extracted from rice tissues by the TRIzol protocol, according to the instructions of the manufacturer (Invitrogen, Carlsbad, CA). RNA was quantified by UV spectrophotometry at 260 and 280 nm (A260/A280 ∼ 2.0; A260 = 40 μg RNA/ml) and confirmed by 1% non-denaturing agarose gel electrophoresis and ethidium bromide staining. For gene-specific amplification of invertase mRNA by reverse-transcription polymerase chain reaction (RT-PCR), a primer pair was designed based on the predicted gene structure. The sequence of the forward primer was usually derived from the second- or third-last exon and the reverse primer was derived from the 3′-UTR region. The primers are listed in Table 2. DNA was removed from total RNA extracts by treatment with RNase-free DNase I (Promega, Madison, WI). RT-PCR was performed in a 25-μl reaction volume containing 12.5 μl Superscript II buffer and 0.5 μl Superscript II reverse transcriptase-Taq polymerase enzyme mix (Invitrogen) with 1.0 μg total RNA and gene-specific primer pair (10 ng each; ∼20-mer). The cycle conditions were as follows: reverse transcription at 50°C for 30 min; pre-amplification denaturation at 92°C for 2 min, 35 cycles of (denaturation at 92°C for 30s, primer annealing at 56°C for 30s, and primer extension at 68°C for 1 min)], and a final extension of RT-PCR products at 68°C for 10 min. RT-PCR products were separated by electrophoresis on 1.2% agarose gels, stained with ethidium bromide, and photographed under UV light using Alpha Imager 2200 (Alpha Innotech Corporation, San Leandro, CA). The amplification product of each RT-PCR reaction was cloned into a pGEM vector (Promega) and sequenced.

Table 2 Gene specific primers designed for RT-PCR of predicted rice invertase genes: Primer sequences are written left to right from 5′ to 3′
Table 3 MitoProt probability that α and β groups of neutral/alkaline invertases of rice (Os) and Arabidopsis (At) are targeted to the mitochondrion

Results

Detection of 19 Rice Invertase Genes

Use of BLASTn and tBLASTn established that the rice genome contains 19 invertase genes (Table 1). Query sequences were derived from previously identified invertase genes encoding cell-wall, vacuolar, or neutral/alkaline isofortns from rice, wheat, barley, maize, tomata, or Arabidopsis. As only 4 of the 19 rice invertase genes have been named previously, we propose here a consistent nomenclature based on the use by Hirose et al. (2002) of “OsCIN1” for a specific rice cell-wall invertase gene. Thus, we designate nine cell-wall invertase genes OsCIN19, two vacuolar invertase genes OsVIN12, and eight neutral/alkaline invertase genes OsNIN18.

The 19 invertase genes were found in the genome sequences of both japonica and indica subspecies of rice (Table 1). They were then located on the genetic map of rice by cross-reference to the emerging physical map of the japonica reference variety Nipponbare. The invertase genes are spread over six chromosomes, with seven genes on chromosome 4 alone. Six of the nine OsCIN genes were found in pairs on three BAC clones, suggesting tandem duplications.

Annotation of Rice Invertase Genes

At the time of our BLAST searches, the rice genome was not fully annotated. We used Genscan, FGENESH, and Expasy tools to predict exon–intron junctions and protein sequences for the invertase genes. We checked our annotations and the annotations published by other groups against (a) published ESTs and full-length cDNA clones, (b) gene-specific partial cDNAs cloned by us (accession numbers AY575548–AY575565), and (c) amino acid motifs conserved in acid (cell- wall and vacuolar) invertases (Fig. 1) or in neutral/alkaline invertases (Fig. 2). The most difficult exon to predict was the mini-exon that contributes three amino acids (DPN) to the second conserved motif of acid invertases from both monocots and dicots and that is skipped in cold-stressed potato (Bourney et al. 1996). Nine of the eleven acid invertases (Fig. 1) and all eight of the neutral/alkaline invertases (Fig. 2) were predicted to contain all of the conserved motifs. OsCIN4 lacked the first 7 of the 13 conserved motifs of acid invertases and OsCIN9 lacked the tenth motif (dashes in Fig. 1). The deletion in OsCIN9 was confirmed by PCR.

Five amino acid residues in the conserved motifs were consistently different between the cell-wall invertases and the vacuolar invertases whose sequences have been published (Fig. 1; boxed). Using numbering based on OsVIN2, the residues are S173A, I214 M, P323V, G431S, and G637A. Although the significance of these differences is not understood, Goetz and Roitsch (1999) showed that the P323V substitution modifies both the pH optimum and the substrate specificities of the cell-wall and vacuolar invertases. When proline was present in the WEC(P/V)D box (i.e., in cell-wall invertases), the pH optimum was more acidic and the cleavage rate of raffinose was higher compared with the vacuolar invertases, which contained valine at this position. This difference (P323 V) and two others (S173A and G431S) are strictly maintained between the cell-wall and the vacuolar invertases of rice, but the two other differences are not strictly maintained: irregular substitutions (Fig. 1; underlined amino acids) were detected in OsCIN6 (F instead of I at 214) and OsCIN7 (V instead of I at 214 and A instead of G at 637). These cell-wall invertases may be inactive or possess altered catalytic properties.

The neutral/alkaline invertases fall into two groups (α and β) that differed consistently at 10 amino acid residues within the conserved motifs (Fig. 2; boxed). It is unclear whether these two groups correspond to neutral invertases and alkaline invertases, respectively, because no plant alkaline invertase has yet been sequenced, and no representative of the β group of genes has yet been transcribed and translated for determination of its pH optimum. Nevertheless, the eight rice neutral/alkaline invertases are divided equally between them. The α group contains OsNIN14 and the β group contains OsNIN58. Using numbering based on OsNIN2, the variant residues are V313C, S317C, H327Y, H329Y, LQ428-429VS, F437W, P500R, T511V, and S561A.

Evolution of Rice Invertases: Protein Sequences

Although the conserved motifs in Figs. 1 and 2 allowed clear classification of the rice acid invertases into two groups (cell-wall and vacuolar) and of the rice neutral/alkaline invertases into two groups (α and β), they varied too little to permit detailed analysis within each group. We therefore conducted Clustal analysis on rice invertases using either the complete predicted coding sequence of the invertase proteins or the inner part of the coding sequence (starting with the first conserved motif and ending with the last conserved motif). The predictions of the two methods were essentially identical. Figure 3 presents the results obtained by use of the inner part of the coding sequence. Clustal analysis separated the 11 acid invertases of rice into three groups: the vacuolar group (OsVIN1–2) and two cell-wall groups (α with OsCIN1–4 and β with OsCIN5-9) (Fig. 3A). The deletions present in OsCIN4 and OsCIN9 (Fig. 1) prevented the inclusion of these proteins. However, when Clustal analysis was repeated for all the acid invertases using only the regions of the sequence present in OsCIN4 or OsCIN9, OsCIN4 clustered with OsCIN3, while OsCIN9 clustered with OsCIN8. These relationships are highlighted in Fig. 3A without showing OsCIN4 and OsCIN9 as part of the evolutionary tree for acid invertases. It is interesting to note that OsCIN89 are located on the same BAC clone on chr 9 (Table 1) and are presumably tandem duplicates. OsCIN23 and OsCIN67 are also tandem duplicates, sharing BAC clones on chr 4.

Figure 3
figure 3

A Unrooted phylogenetic tree of 11 predicted acid invertases of rice. The α group contains cell-wall invertases (OsCIN1–9) and the β group contains vacuolar invertases (OSVIN1–2). Two of the invertases (OsCIN4 and OsCIN9) are only associated with the cladogram because of deletions (see text). B Unrooted phylogenetic tree of eight predicted neutral/alkaline invertases of rice. The α group (OsNIN1–4) and the β group (OsNIN5–8) are indicated. Both analyses used the sequence of the central region of each protein, starting at the first conserved motif and ending at the last conserved motif (see Figs. 1 and 2 and text). The robustness of the branching is measured as a percentage of 1000 repetitions of bootstrap analysis.

Clustal analysis using the inner conserved region of the eight rice neutral/alkaline invertases (Fig. 3B) separated these proteins into the same α and β groups previously identified based on 10 specific residues within the 12 conserved motifs (Fig. 2). No evidence for tandem duplication was found for any of the OsNIN18 genes.

Evolution of Rice Invertases: Exon–Intron Structures

The exon–intron structure of the rice invertase genes provides further insight into the evolutionary history of the acid invertases and the neutral/alkaline invertases (Fig. 4).

Figure 4
figure 4

Exon–intron structures of 19 predicted rice invertase genes. A Cell-wall invertases (α and β groups) and vacuolar invertases. B Neutral/alkaline invertases (α and β groups). The 13 conserved regions for cell-wall and vacuolar invertases and the 12 conserved regions for neutral/alkaline invertases are indicated as black boxes or lines. The lengths exons and introns are indicated. In B, the dashed double-headed arrows show the introns within conserved regions.

Five of the eleven acid invertase genes of rice contain seven exons: OsCIN13, OsCIN5, and OsVIN1 (Fig. 4). The locations of the exon–intron junctions in these five rice genes are fully conserved and may correspond to the structure of the ancestral acid invertase gene. The other six acid invertase genes would then be derived from the ancestral form by intron loss. The α group of cell-wall invertase genes consists of OsCIN13 and the truncated gene OsCIN4. The exon–intron structure of OsCIN4 indicates that its truncation did not occur by duplication of OsCIN3 gene followed by recombinational loss from the copy of the first two and a half exons (and two introns). The absence of introns 3–5 from OsCIN4 suggests that this gene arose from a duplicate of OsCIN3 by a mechanism termed “concerted intron loss” and known to occur in yeast through homologous recombination between the gene and a partial cDNA copy of the gene (Ares et al. 1999; Lynch and Kewalramani 2003).

The evolution of gene structure in the β group of CIN genes is also reasonably clear. As with the α group, the founding gene of the β group contained seven exons and duplicated to give OsCIN5 (also with seven exons) and another copy that lost intron 4 to generate a six-exon gene; the latter duplicated in tandem to yield OsCIN6 and OsCIN7. OsCIN6 duplicated with the loss of intron 3 to yield OsCIN8 (five exons). Subsequently, OsCIN8 duplicated in tandem to produce OsCIN9 but the duplicate suffered a substantial internal deletion that reduced it to three exons through loss of the tenth conserved motif and introns 4 and 5. This last alteration may also have involved homologous recombination with a partial cDNA copy of the gene. The loss of the conserved tenth motif suggests that OsCIN9 could not encode an active cell-wall invertase.

The two rice vacuolar invertase genes are closely related at the amino acid level (Fig. 3A) but OsVIN1 contains seven exons, whereas OsVIN2 contains only three exons (Fig. 4A). We suggest that the two genes originated from a common forerunner by duplication and then OsVIN2 lost introns 1, 4, 5, and 6. The loss of three neighboring introns at the 3′-end of the gene is unlikely to have been due to sequential loss of individual introns. It is more likely that OsVIN2 experienced RNA-mediated concerted intron loss, as already invoked for OsCIN4 and suggested for OsCIN9.

The eight neutral/alkaline invertases of rice cluster into the same α and β groups irrespective of whether the analysis is based on protein sequence (Figs. 2 and 3B) or exon–intron structure (Fig. 4B). The genes of the α group (OsNIN1OsNIN4) contain six exons, while the genes of the β group (OsNIN5OsNIN8) contain four exons. The founding gene of the α group underwent a duplication to produce the forerunner of OsNIN1OsNIN2 and the forerunner of OsNIN3OsNIN4 (Fig. 3B). These forerunners then duplicated once to produce the current α group. The founding gene of the β group underwent an initial duplication to produce OsNIN5 and the forerunner of OsNIN6OsNIN8, which underwent a duplication to produce OsNIN6 and the forerunner of OsNIN7OsNIN8. The final duplication produced OsNIN7 and OsNIN8 (Fig. 3B).

Expression of Rice Invertase Genes

The expression of rice invertase genes was examined by reverse transcription PCR, using pairs of gene-specific primers (Table 2). The primers were incubated with RNA extracted from leaf (mature, young or etiolated), callus, panicle, root, leaf sheath, and stem. All genes except OsCIN9 produced amplicons of the expected size with RNA from at least two tissues (Fig. 5A). No amplicon was detected for OsCIN9 primers, although these primers produced a PCR amplicon of the expected size with genomic DNA (not shown). We suspect that OsCIN9 is transcriptionally inactive and should be classed as a pseudogene. Four other cell-wall invertase genes (OsCIN13 and OsCIN8) were transcribed in each tissue examined, while OsCIN47 showed marked tissue specificity in their expression. The two vacuolar invertase genes differed in expression pattern. OsVIN1 was highly expressed in every tissue tested, while OsVIN2 was highly expressed in mature leaf, panicle, and root and poorly expressed in young and etiolated leaves. The two groups of alkaline/neutral invertase genes differed in expression pattern: genes of the α group (OsNIN14) and one gene of the β group (OsNIN8) were highly expressed in all tissues examined, while the remaining genes of the β group (OsNIN57) showed some degree of tissue specificity.

Figure 5
figure 5

Expression analysis of 19 predicted rice invertase genes using gene-specific RT-PCR primers. A RT-PCR conducted on RNA extracted from major organs of the rice plant. B RT-PCR conducted on RNA extracted from panicles at the indicated times. Ethidium bromide–stained RT-PCR products were separated on a 1.2% (w/v) agarose gel. Primers of cytosolic glyceraldehyde-3-phosphate dehydrogenase were used as a control for effective amplification and absence of DNA. ML, mature leaf; YL, young leaf; E shoot, etiolated shoot; DBH, day before heading; DAH, day after heading.

We examined invertase gene expression in the panicle at six stages of development (Fig. 5B). The stages were 10 days before heading (corresponding to anther meiosis in many spikelets), 3 days before heading, 1 day before heading, heading, 3 days after heading (50% flowering), and 5 days after heading (start of grain filling). OsCIN1 and OsCIN2 were highly expressed throughout this period. OsCIN3 was highly expressed from 3 days before heading to 3 days after heading. OsCIN4 was highly expressed at meiosis and expressed weakly at 1 day before heading. OsCIN5 was most highly expressed after heading. OsCINV8 was expressed most actively at meiosis and 3 days after heading. OsCIN6, OsCIN7, and OsCIN9 were not expressed to a detectable extent in panicles, in agreement with Fig. 5A. The two vacuolar invertase genes differed in pattern of gene expression: OsVIN1 was highly expressed from 3 days before heading to 3 days after heading, while OsVIN2 was highly expressed during heading and flowering stages. Remarkably, all eight neutral/alkaline invertase genes were expressed in panicles throughout the 16 days of development covered by the analysis. The acid invertase genes appear to show much greater diversity in pattern of gene expression than the neutral/alkaline invertase genes.

Discussion

Comparative Evolution of Rice and Arabidopsis Invertases

Our data establish that rice contains 19 invertase genes: 9 cell-wall, 2 vacuolar, and 8 neutral/alkaline. The corresponding numbers for Arabidopsis are a total of 17 invertase genes, comprising 6 cell-wall, 2 vacuolar, and 9 neutral/alkaline. Vargas et al. (2003) claimed that there are 11 neutral/alkaline invertases in Arabidopsis but they counted two genes twice, as both cDNA clones and genomic clones (see legend to Fig. 6).

Figure 6
figure 6

Phylogenetic trees for acid invertases (A) and neutral/alkaline invertases (B) of rice and Arabidopsis. Both analyses used the sequence of the central region of each protein, starting at the first conserved motif and ending at the last conserved motif (see Figs. 1 and 2 and text). The acid invertase genes of Arabidopsis were reported by Sherson et al. (2003) and Haouazine-Takvorian et al. (1997). The neutral/alkaline invertase genes of Arabidopsis were reported by Vargas et al. (2003); At-A/N-InvB and At-A/N-InvJ are omitted here because they are the cDNA clones of At-A/N-InvA and At-A/N-InvI, respectively. Yeast cell-wall invertase SUC2 (NP_012104) and the alkaline invertase of the cyanobacterium Anabaena (AJ491788) were used as outliers. The number of exons in each gene is indicated. See text for explanation of asterisk against exon numbers for AtvaINV1,2. The boxes represent the invertases likely to have been present in the last common ancestor (LCA) of rice and Arabidopsis, and the estimated exon number is indicated. Circles with + and − signs indicate branches where introns were inserted or removed. The robustness of the branching is measured as a percentage of 1000 repetitions of bootstrap analysis.

ClustalW analysis was conducted on the protein sequences of acid invertases (Fig. 6A) and neutral/alkaline invertases (Fig. 6B) of rice and Arabidopsis. The vacuolar invertases of both Arabidopsis and rice cluster together, clearly separated from the cell-wall invertases, indicating that the origin of the vacuolar invertases from cell-wall invertases predated the last common ancestor (LCA) of rice and Arabidopsis. The separation of the two rice vacuolar invertases from the two Arabidopsis vacuolar invertases suggests that the LCA contained a single vacuolar invertase (of seven exons) that underwent a duplication event in each of the lineages to rice and Arabidopsis. However, exon-intron analysis does not support such a simple conclusion. Although the two vacuolar invertases of Arabidopsis have seven exons, like OsVIN1, the exon–intron junctions are not fully conserved between rice and Arabidopsis: the fourth intron of OsVIN1 is absent in Arabidopsis and a different intron is found in what is exon 3 of OsVIN1 (not shown). These two different types of seven-exon gene are indicated in Fig. 6A as 7 in rice and 7* in Arabidopsis. Thus, the LCA may have had two vacuolar invertase genes of seven exons, but one would have been lost in the lineage to rice and the other would have been lost in the lineage to Arabidopsis. These two precursors in LCA are represented by the boxes V1 and V2. A second alternative that we cannot dismiss at present is that the LCA may have possessed a single vacuolar invertase gene of six (or eight) exons that gained (or lost) an intron by two different pathways to produce two different seven-exon genes (represented by 7 and 7*) in the lineages to rice and Arabidopsis.

Analysis of the evolution of the cell-wall invertases reveals a more complex history in which gene loss might again have to be invoked in addition to gene duplication. The LCA appears to have had four cell-wall invertase genes (boxes labeled C1–4 in Fig. 6A), all of seven exons. C1 was the forerunner of OsCIN13 (seven exons) and of AtcwINV2 (five exons), and AtcwINV4 (six exons). C2 was the forerunner of OsCIN58 (five to seven exons) but was lost in the lineage to Arabidopsis. C3 was the forerunner of AtcwlNV1,3,5 (five or seven exons) but was lost in the lineage to rice. C4 was the forerunner of AtcwINV6 (seven exons) but was also lost in the lineage to rice. This pattern is consistent with the recently discovered phenomenon in plants of cycles of whole- or partial-genome duplication followed by large-scale gene loss (Gaut and Doebley 1997; Bowers et al. 2003). Both the rice and the Arabidopsis genomes show signs of the occurrence of such cycles (Chapman et al. 2004).

Analysis of the neutral/alkaline invertases of rice and Arabidopsis reveals a similar situation (Fig. 6B). The separation of neutral/alkaline invertases into the α and β groups clearly predates the LCA. We discuss the subsequent evolution of the β group first because it illustrates clearly the existence of branches of the cladogram that contain rice genes but no Arabidopsis genes, and vice versa. This situation leads us to suggest that the LCA contained a β group of four genes (boxes labeled β1–4), all with four exons. β1 and β2 were the forerunners of OsNIN5 and OsNIN6, respectively, but are now absent in Arabidopsis, β3 was the forerunner of At-A/N-InvD,F,G,I but was lost in the lineage to rice, and β4 was the forerunner of OsNIN7,8 and At-A/N-InvK. (An alternative explanation for the branches leading to only rice or Arabidopsis is that two recent frame-shift mutations might have made a major difference to the protein sequence that lies between them and hence shifted that gene to an unexpected location within the cladogram. We checked for such major local changes in protein sequence and found no evidence for them.) In the β group there were no losses of introns, but one intron was gained (arrow) to give the forerunner of At-A/N-InvG,I.

For the α group, we suggest that the LCA contained three α genes of six exons (boxes labeled α1–α3). α1 led to a single Arabidopsis gene (At-A/N-InvC), α2 led to rice (OsNIN12) and Arabidopsis (At-A/N-InvA,C,H) genes, and α3 led to rice (OsNIN34) and Arabidopsis (At-A/N-InvE) genes. No intron gain or loss occurred.

In summary, we suggest that since the LCA the vacuolar lineages to rice and Arabidopsis have undergone a total of two gene losses and two gene duplications. In the cell-wall lineages there have been 3 losses and 10 duplications, including those giving rise to OsCIN4 and OsCIN9 (truncated genes not depicted in Fig. 6A). In the neutral/alkaline α lineages there were a total of one loss and three duplications, while in the neutral/alkaline β lineages there were a total of three losses and four duplications. In total, the 11 acid invertase genes leading to rice and Arabidopsis experienced 5 gene losses and 12 gene duplications, while the 8 neutral/alkaline invertase genes experienced 4 losses and 7 duplications. These rates of evolution by gene loss and duplication are similar between the two invertase families. By contrast, when evolution was measured by intron loss events (taking concerted intron loss as one event), the acid invertases experienced 10 intron loss events, while the neutral/alkaline invertase genes experienced 1 intron gain and no losses.

This impressive difference may be related to another difference between the two invertase families: most of the introns of neutral/alkaline invertases are located within conserved blocks (Fig. 4B), while most of the introns of acid invertases are located between conserved blocks (Fig. 4A). However, it is unlikely that intron location as such determines the rate of intron loss, because if the latter occurs by homologous recombination of cDNA segments, it should have no effect on the coding sequence and should not allow conservation of protein sequences to select against intron loss. Reasons for the higher conservation of protein sequence in neutral/alkaline invertases will emerge from studies on the three-dimensional structure of the two families, and the lower intron loss rate may well be explicable in terms of mRNA structure providing sites for cDNA synthesis (Feiber et al. 2002).

Prediction of Subcellular Locations of Cell-Wall Invertases

The terms “cell-wall,” “vacuolar,” and “cytosolic” are used to describe the locations of plant invertases. We have employed selected ExPASy tools (PSORT, SignalP, TargetP, MitoProt) to predict the subcellular locations of the rice and Arabidopsis invertases that were described above. With four exceptions, all 17 cell-wall invertases were predicted to have the hydrophobic N-terminal signal peptide required for co-translational insertion into the endoplasmic reticulum and secretion from the cell (Bendtsen et al. 2004). The four exceptions were the two rice pseudogenes (OsCIN4 and OsCIN9) and two additional invertases (AtcwINV6 and OsCIN6), which were predicted by annotating algorithms to be about 40 amino acids shorter at the N-terminus than a normal cell-wall invertase and, thus, to lack the signal peptide. As both AtcwINV6 (Sherson et al. 2003) and OsCIN6 (Fig. 5) are transcribed, we checked the annotation of these genes in case the DNA encoding a signal peptide had been inadvertently omitted from the predicted gene sequence. For AtcwINV6 both the genomic clone (AL163812) and an EST (AV828919) indicated the presence of an in-frame termination codon located five triplets upstream from the predicted initiating ATG. The EST sequence showed that this termination codon was not removed from the mRNA by exon-intron splicing. For OsCIN6, the gene (AL606646) sequence predicted a stop codon located 20 triplets upstream from the predicted initiating ATG. We have not yet found an EST to confirm that the termination codon survives RNA splicing. It is not clear whether the transcripts from AtcwINV6 and OsCIN6 are translated.

Subcellular Targeting Mechanism of VacuolarInvertases

The most commonly studied mechanism for targeting soluble proteins to the plant vacuole also makes use of a hydrophobic N-terminal signal peptide to direct proteins initially into the endoplasmic reticulum (Vitale and Chrispeels 1992). Additional signals within the proteins then target them to the vacuole via a pre-vacuolar compartment (Bassham and Raikhel 1997). PSORT (Nakai and Kanehisa 1992) and TargetP (Emanuelsson et al. 2000) predicted that none of the vacuolar invertases AtvaINV12 and OsVIN12 possesses an N-terminal signal peptide. Instead, PSORT predicted that all four proteins contain a single sub-terminal transmembrane segment (located about 35–45 residues downstream from the N-terminus) and would adopt the NinCout configuration of type II single-pass membrane proteins, which are synthesized on free ribosomes and insert into cellular membranes post-translationally. The program predicted that the rice and Arabidopsis vacuolar invertases would insert into the plasma membrane. However, studies on yeast vacuolar targeting suggest three ways in which such type II single-pass membrane proteins could be targeted to vacuoles. The first route is to enter the plasma membrane post-translationally as predicted by PSORT and then to pass to the vacuole by a form of endocytosis (Chiang et al.) 1996). This is the route taken by many tonoplast membrane proteins. The second route is to insert post-translationally directly into the tonoplast, as demonstrated for the yeast aminopeptide I (Kim and Klionsky 2000) and is employed during autophagy of yeast cells (Tucker et al. 2003). The third route is followed by several tonoplast membrane proteins, by-passes the pre-vacuolar compartment and the plasma membrane, and depends instead on a special targeting domain (Piper et al. 1997). This domain has been best studied in yeast alkaline phosphatase (ALP), a type II tonoplast membrane protein. The targeting domain of ALP is composed of a hydrophilic N-terminal segment and a hydrophobic sub-terminal transmembrane segment. When fused to the yeast cell-wall invertase Suc2, the domain redirects this enzyme away from the ER and the cell wall and into the vacuole of yeast (Klionsky and Emr 1990). Figure 7 compares this domain from the ALP of several fungi with the N-terminus of several plant vacuolar invertases. Although the ALPs domains and the plant vacuolar N-termini are not precisely conserved, the two sets of sequences have major features in common: LL or PLP near the N-terminus, followed by a strongly basic region and then a hydrophobic transmembrane segment (Vowels and Payne 1998; Darsow et al. 1998). All these features have been regarded as significant features in yeast. At the present time we can do no more than suggest that vacuolar invertases are transported to the tonoplast by the ALP mechanism, where they adopt a NinCout configuration (with the short N-terminal segment in the cytosol and the long C-terminal region in the vacuole). It is likely that a vacuolar protease will release the C-terminal region into the lumen of the vacuole. Immediately downstream from the transmembrane region, vacuolar invertases contain a conserved motif that is absent from cell-wall invertases (Balk and de Boer 1999). This motif was first described by Sturm (1999), who suggested that it functions to target vacuolar invertases to the vacuole. If the ALP-type N-terminal domain targets vacuolar invertases to the tonoplast and a vacuolar proteinase then releases the C-terminal region in the lumen by cleaving the motif, the net effect is that both domains play a role in targeting.

Figure 7
figure 7

Alignment of N-termini of plant vacuolar invertases and N-termini of fungal vacuolar alkaline phosphatases. The common features between the two sets of sequences are indicated in the block diagram. The accession numbers and species names are indicated.

Prediction of Subcellular Location of Neutral/Alkaline Invertases

Although the plant neutral/alkaline invertases are believed to be cytosolic, PSORT predicted that, with two exceptions, all α group members for rice and Arabidopsis are targeted to the mitochondrion. The exceptions were OsNIN3 and At-A/N-InvE, which were predicted to be targeted to the nucleus and endoplasmic reticulum, respectively. We checked these predictions with MitoProt II (Claros and Vincens 1996) and TargetP (Emanuelsson et al. 2000). MitoProt II predicted that the probability of mitochondrial targeting was greater than 95% for all α group members, except for OsNIN3 (88%) and At-A/N-InvE (84%). In their analysis of the mitochondrial proteome, Millar et al. (2001) adopted a probability of >85% as indicating mitochondrial targeting. TargetP predicted chloroplast targeting for OsNIN3 and At-A/N-InvE and mitochondrial targeting for all other α group members. Thus, there is good agreement among all three prediction programs concerning which members of the α group neutral/alkaline invertases are likely to be targeted to the mitochondrion and which are likely to be targeted elsewhere. Direct analysis of mitochondrial and chloroplast subtractions will be needed to confirm the presence of the α group invertases in these organelles. It is encouraging that OsNIN3 and At-A/N-InvE cluster together in Fig. 6B. The tree is based on the conserved central region of the protein sequence and excludes the N-termini that are the basis for prediction of targeting to the mitochondrion or chloroplast. The tree suggests that the exceptional predictions made for these two proteins have an evolutionary basis and are not merely the result of random changes in the N-terminus of the proteins. It should be noted that these are the only two α group proteins that are not predicted by PredictProtein (Rost and Liu 2003) to have the amphipathic helical N-terminus associated with mitochondrial targeting (Brink et al. 1994).

The presence of invertase activity in mitochondria and chloroplasts has not been reported to our knowledge. However, sucrose can diffuse from the cytosol into the mitochondria through pores in the outer and inner membranes (Zoratti and Szabo 1995; Curtis and Wolpert 2002). Sucrose is also imported rapidly into plastids as judged by the ability of yeast invertase, when targeted to potato amyloplasts, to hydrolyze 80% of cellular sucrose in vivo (Gerrits et al. 2001). It has recently been shown in the moss Phycomitrella patens that a novel class of hexokinase is targeted to chloroplasts (Olsson et al. 2003). This enzyme has homologues in other plants including rice. Thus, at least with plastids, it becomes possible to envision that an organellar neutral/alkaline invertase might hydrolyze sucrose to glucose and fructose, which would then be phosphorylated by the organellar hexokinase.

PSORT predicts peroxisomal targeting for all members of the β group except At-A/N-InvK. The latter is predicted to be located in the endoplasmic reticulum. It is not clear how much significance should be placed on these predictions. The prediction of peroxisomal targeting is based on the presence of a type I peroxisomal targeting sequence (SRL or similar tripeptide) in the protein sequence. Research indicated that for effective peroxisomal targeting the tripeptide should be located at or very close to the C-terminus of the protein. TargetP requires that the tripeptide is located at the C-terminus and places some additional requirements on the last 12 amino acids preceding it. TargetP predicts that none of the neutral/alkaline invertases is targeted to the peroxisome (or to the chloroplast or mitochondrion). PSORT allows the tripeptide to be anywhere in the protein. This seems to be an unjustified degree of latitude that leads PSORT to predict any protein with a SRL-like sequence to be target to the peroxisome, irrespective of whether the sequence and its location are conserved. In the case of OsNIN5-8, the location of the tripeptide (SRL, ARL, SHL) is not conserved and is therefore dubious as a predictor of targeting. In the case of At-A/N-InvC,D,F,G,I, the tripeptide (SRL, SHL, ARL, SRL) is mostly 61–71 amino acids from the C-terminus. MitoProt II predicts that the β group members are targeted to the mitochondrion with a probability of less than 24%. We conclude that the evidence is in favor of a cytosolic location for these enzymes.

Expression and Function of Multiple Invertase Genes

If the diversity of subcellular targeting predicted above for neutral/alkaline invertases is supported by other approaches, it provides a partial explanation for the multiplicity of these genes in plants. The multiplicity of cell-wall invertases has for some years been linked to diversity in tissue-specific and temporal expression, as first shown for tomato (Godt and Roitsch 1997). Our RT-PCR study indicates that diversity in organ-specific and temporal expression is also a feature of cell-wall invertases in rice. However, our unpublished results from RNA in situ hybridization and quantitative RT-PCR after stress and hormone treatments establish a still richer pattern of specificity of expression. We are applying similar techniques also to the neutral/alkaline invertase genes to determine whether their relatively uniform expression observed here across organs and across time in the panicle is misleading.

A more subtle level of diversity is found in the functional differences between paralogous proteins. We have already referred to the work of Goetz and Roitsch (1999) showing that a single amino acid substitution in an acid invertase alters its pH optimum and its cleavage rate against raffinose. Among the eight or nine neutral/alkaline invertases of rice or Arabidopsis, it remains to be established which are neutral and which are alkaline.

Another possible explanation for the existence of multiple forms of invertases in tissues is the evolution of fructan fructosyltransferases from vacuolar invertases (Vijn and Smeekens 1999) and of fructan exohydrolases from cell-wall invertases (van den Ende et al. 2000). We have conducted a functional study on the two vacuolar invertases of rice by expressing OsVIN1 and OsVIN2 in the fungus Pichia pastoris and isolating the secreted protein for enzymatic analysis (Ji et al., manuscript in preparation). These two recombinant proteins show high invertase activity, but they also show significant sucrose:sucrose-l-fructosyltransferase activity, although rice is not known to accumulate fructans. We are also studying the possibility that certain OsCIN genes may be more correctly described as fructan exohydrolases than cell-wall invertases. Recently, it was demonstrated that some of the so-called cell-wall invertases of Beta vulgaris and A. thaliana encode functional fructan exohydrolases. These enzymes lost their capacity to degrade sucrose (Van den Ende et al. 2003; De Coninck et al. 2004). However, as the two Arabidopsis cell-wall invertases (AtcwINV3,6) that are now regarded as fructan exohydrolases are not closely related to any OsCIN protein (Fig. 6A), it is difficult to predict which OsCIN protein, if any, might also prove to be a fructan exohydrolase.

Structural Predictions for Acid and Neutral/Alkaline Invertases

Analysis of the acid invertases of rice and Arabidopsis using InterProScan (Zdobnov and Apweiler 2001) indicated that these proteins are related to the family 32 of the glycosyl hydrolase enzymes (Henrissat B, Coutinho P, Deleury E; http://afmb.cnrs-mrs.fr/CAZY/index.html). Alberto et al. (2004) determined the first structure of this enzyme family: an invertase of the bacterium Thermotoga maritima. The first plant enzyme structures within this family will soon become available (W. Van den Ende, pers. comm.). All these enzymes have two Asp, one Glu, and one Cys in their active site compatible with the catalytic mechanism as proposed earlier by Reddy and Maley (1996). The enzymes consist of a five-bladed β-propeller module connected to a β-sandwich module. Analysis of the acid invertases of rice and Arabidopsis using ProteinPredict indicated that these enzymes and the invertase of T. maritima have a comparable distribution of β structures in the region of the protein stretching across the four conserved active site residues as previously predicted by Pons et al. (1998). In the plant and bacterial proteins, there are at least three long β structures between the first and the second Asp residues and two between the second Asp and the Glu–Cys residues, and three or four between Glu–Cys and the α-helix. Figure 1 shows that several of the conserved motifs of acid invertases correspond to or overlap with the β structures.

Analysis of the neutral/alkaline invertases of rice and Arabidopsis using InterProScan indicated that these proteins are members of the α66 toroidal protein family. Itoh et al. (2004) determined the X-ray crystallographic structure of the unsaturated glucuronyl hydrolase (UGL) from Bacillus sp. to a resolution of 1.8 Å. UGL is a glycosaminoglycan hydrolase that releases unsaturated d-glucuronic acid from oligosaccharides produced by polysaccharide lyases. UGL consists of 377 amino acid residues organized into a structure that includes an α66-barrel and belongs to family 88 of the glycosyl hydrolase superfamily (Henrissat B, Coutinho P, Deleury E; http://afmb.cnrs-mrs.fr/CAZY/index.html). One side of the UGL barrel structure consists of long loops containing three short β-sheets and contributes to the formation of a deep pocket for catalysis and substrate binding. The most likely candidate catalytic residues for glycosyl hydrolysis are Asp88 and Asp149. This was supported by site-directed mutagenesis studies in Asp88 and Aspl49 (Itoh et al. 2004). According to ProteinPredict, all neutral/alkaline invertases of rice and Arabidopsis adopt a similar secondary structure, 12 α-helices organized into six hairpins, with alternating long and short loops connecting the ends of the helices. We predict that all the conserved motifs shown in Fig. 2 are in these loops. The equivalents of Asp88 and Asp149 are tentatively indicated in Fig. 2.

Comparative Evolution of Acid and Neutral/Alkaline Invertases

The evolution of the OsNIN genes differs from that of the OsCIN and OsVIN genes in five fundamental respects. First, at the protein sequence level, the 12 conserved motifs of OsNIN proteins are longer and comprise a greater percentage of the protein than the 13 conserved motifs of OsCIN/OsVIN. The emerging three-dimensional structures of the GH32 and GH88 families (five-bladed β-propeller and α66 toroid, respectively) may provide important clues to explain this difference. It is possible that the tendency of OsNIN proteins to form oligomers (Lee and Sturm 1996) adds to the degree of conservation of the polypeptides.

Second, although gene duplication is common in both families, tandem duplication either did not occur in the OsNIN lineage or occurred so distantly in time that traces of it have been lost by chromosomal rearrangements.

Third, the OsNIN genes show few examples of intron loss, either singly or by a concerted mechanism. This may ultimately be traceable to differences in transcript structure that favor the formation within the nucleus of the double-stranded cDNA required for intron loss by homologous recombination.

Fourth, unlike the seven-exon cell-wall invertase genes and seven-exon vacuolar invertase genes, which have almost all introns in common, the six-exon α group and the four-exon β group of the neutral/alkaline invertase genes have no exon in common. Each neutral/alkaline invertase gene has introns in conserved motifs 4 and 9, but in different locations within those motifs. The implication of this result is that the two groups of neutral/alkaline invertase genes separated much more distantly in the past than the cell-wall and vacuolar acid invertase genes. If we are correct to predict that the α group is located in the mitochondrion and chloroplast and the β group is located in the cytosol, this division is very ancient. However, the recent report by Chen et al. (2004) that a signal peptide can direct α-amylase αAmy3 into the secretion pathway and into the chloroplast means that any prediction of subcellular targeting based on sequence analysis alone must be considered provisional until backed up by other evidence.

Fifth, most of the introns of the acid invertase genes are located between conserved motifs, whereas most of the introns of the neutral/alkaline invertase genes are located within conserved motifs. Given the likely antiquity of most of these introns, it is intriguing to think that they may hold some clue as to the original mechanisms of intron insertion. Were the introns introduced into both families after the endosymbiotic origin of plants? Or were the introns of the acid invertases inserted much earlier, in the respiratory eukaryote that would eventually act as host for the symbiont? Or are the introns a legacy of prokaryotic ancestors, even though extant relatives such as cyanobacteria, eubacteria, and yeast lack introns in their invertase genes? This question has been discussed for plant glyceraldehyde-3-phosphate dehydrogenases (Liaud et al. 1990). Comparison of nuclear and chloroplast genomes of Arabidopsis with cyanobacteria suggest that about 18% of Arabidopsis nuclear genes derive from the endosymbiont (Martin et al. 2002). The above questions may be pursued further through a comparison between rice and Arabidopsis of the exon–intron structures of nuclear genes of cyanobacterial and non-cyanobacterial origins.