Introduction

Cultivated rice (Oryza sativa L.) is the world’s most important food crop, and more than 90% of the world’s rice production and consumption occurs in Asia. One-third of the world’s population depends on rice as a primary food source (Khush 1997). However, the productivity of rice is continuously being threatened by various unfavorable environmental factors, such as salinity, drought and growing temperature, and by many kinds of pathogens.

Wild crop relatives are an important source of existing genetic variations, including resistance to various stresses (Vaughan 1994; Xiao et al. 1998). In the case of rice, wild relatives have diversified over a wide range of environments over 40 million years and have become major gene sources for various valuable economic characteristics that are not easily available in the cultivated germplasm (Khush 1997). The Oryza genus comprises 23 species and nine recognized genome types (AA, BB, BBCC, CC, CCDD, EE, FF, GG and HHJJ). These genomes are distantly related to O. sativa and could provide beneficial genes for cultivated rice (Ge et al. 1998; Vaughan 1994). In the case of O. rufipogon (AA genome), in spite of its overall inferior appearance, it is known to contain valuable traits with respect to salinity tolerance and cytoplasmic male sterility, and useful quantitative trait loci (QTL) with agronomically important traits. O. officinalis (CC genome) has also been reported to show resistance to vermin, such as the yellow stem borer, planthopper and leafhopper. These results suggest that the wild-type germplasms may provide new solutions to improve rice productivity, which may also be applicable to other crops (Brar et al. 1991; Brar and Khush 1997; Vaughan 1994; Xiao et al. 1996, 1998).

Expressed sequence tags (ESTs), which is the single-pass sequencing of randomly chosen clones from the cDNA library, are molecular tools that are more than adequate in defining an expressed gene and reflecting transcript abundance. The global and comprehensive analyses of many species have been made possible by large-scale EST projects (Adams et al. 1991). Large-scale EST databases provide a great deal of information on the complexities of gene expression patterns, the functions of transcripts and the development of single nucleotide polymorphisms (SNPs) (Breyne and Zabwau 2001; Michalek 2002; Wu et al. 2002). In the plant kingdom, large-scale EST databases have been accumulated for model plants and crops—for example, Arabidopsis thaliana, O. sativa, Zea mays, Brassica napus, B. campestris and Medicago truncatula. In addition, various ESTs from diverse tissues, developmental stages, and stress-treated cDNA libraries have been compared and reported (Carson et al. 2002; Chen et al. 2002; Covitz et al. 1998; Höfte et al. 1993; Keith et al. 1993; Lim et al. 1996; Ok et al. 2000; Newman et al. 1994; Park et al. 1993; Qutob et al. 2000; Sasaki et al. 1994; Uchimiya et al. 1992; Ujino-Ihara et al. 2000; Wu et al. 2002). In the cases of A. thaliana and rice, full genome and draft sequences were reported recently (Goff et al. 2002; The Arabidopsis Genome Initiative 2000; Yu et al. 2002).

We describe here, for the first time, the expression patterns of novel genes expressed under normal growth conditions in a wild species of the Oryza genus. O. minuta, a wild relative of rice, contains the BBCC genome and has been used as a donor of resistance to blast and bacterial blight. Because of the potential importance of wild rice species in rice breeding, rice breeders have recently been paying them a great deal of attention (Vaughan 1994). We report on the partial sequences, database comparisons and functional categorization of 5,211 randomly collected vegetative-stage leaf cDNA clones of O. minuta (BBCC) based on the classification of the Munich Information Center for Protein Sequences (MIPS) for Arabidopsis thaliana. These data will be useful for those searching for novel genes expressed at the vegetative leaf stage and will contribute to the connection between EST databases designed to elucidate the gene expression profiles of plants, especially those of the Poaceae family.

Materials and methods

Plant material and construction of the cDNA library

Four-week-old leaf samples of glasshouse-grown Oryza minuta (accession no. 101144) were used for this study. Total RNA was isolated from leaves using TRIZOL reagent (Gibco/BRL, Gaithersburg, Md.) according to the manufacturer’s instruction. The amount and quality of total RNA was checked by spectrophotometry (OD260/280) and a formaldehyde-1% agarose gel electrophoresis system. Poly (A)+ RNA was extracted from total RNA using the PolyATtract mRNA isolation system (Promega, Madison, Wis.) according to the manufacturer’s protocol. A HybriZAP-2.1XR library construction kit and a HybriZAP-2.1 XR cDNA synthesis kit (Stratagene, La Jolla, Calif.) were used to construct the cDNA library at Eugentech (Korea). The library was packaged into Gigapack III Gold packaging extract; lambda ZAP yielded 6×106 primary plaques, which were then amplified to a titer of 3×1010 pfu/ml. cDNA-inserted pAD-GAL4-2.1 phagemid vectors were excised by mass in vivo excision using an ExAssist helper phage system (Stratagene). The titer of the resulting library was as 1.67×10cfu/ml, and phagemids were used to infect Escherichia coli strain XLOLR according to the manufacturer’s instructions.

Nucleotide sequencing and sequence data analysis

A total of 5,760 randomly collected clones were sequenced at Green Gene BioTech (Korea) from the 5′ ends using the 5′ AD primer (5-AGGGATGTTTAATACCACTAC-3′) of the pAD-GAL4-2.1 phagemid vectors. Ambiguous sequences of the 5′ and 3′ ends were removed, and vector sequences were trimmed automatically using a custom Python script. This script linked sequence backup, basecalling by Phred (trimming option on, cut-off set to 0.05; Green Gene BioTech). Sequences smaller than 200 bp or with more than 5% ambiguity were excluded. blastx searches and putative identifications were carried out automatically by Python script and the web blast program. Contigs were constructed with the edited sequences using CAP (Contig Assembly Program). All EST data are publicly available through the National Center for Biotechnology Information (NCBI, USA; GenBank dbEST accession nos. CB209721–CB214919).

Comparative analysis of the EST sequences

The MIPS functional categories applied to Arabidopsis genes were used for O. minuta. Translated O. minuta ESTs were categorized into 20 functional groups and an unclear classified group by sequence comparison with all Arabidopsis proteins using a P-value cut-off threshold of 10-5. All Arabidopsis protein sequences were downloaded from the ftp site of MIPS and transformed to blast searchable data by the formatdb program (NCBI). Functional redundancies of Arabidopsis proteins were allowed in the classification.

Results and discussion

Characterization of the cDNA library and EST sequences

A cDNA library was constructed using mRNA isolated from Oryza minuta leaves. The primary library contained 6×106 recombinant phages, and after plaque amplification, the serial titers of the amplified plaques showed that the library contained approximately 3×1010 pfu/ml of SM buffer, which was considered to adequately represent gene expression. After the removal of vector sequences and ambiguous short sequences (<200 bp) from the 5’ end sequences of 5,760 O. minuta cDNA clones, 5,211 clones revealed meaningful sequences (Table 1). These 5,211 O. minuta cDNAs produced a total of 3,401 unique sequences, which consist of 2,787 singletons and 614 assembled sequences. Redundancy (ESTs assembled in clusters/total ESTs) of an mRNA indicates the abundance of its corresponding cDNA in non-normalized libraries. In other words, information on randomly picked cDNAs represents the relative expression levels of the genes in a library. In this project, among than 5,211 total cDNAs, 2,424 genes were assembled in 614 clusters, indicating a redundancy of 46.5%.

Table 1 Summary of the Oryza minuta EST library

Database comparisons of cDNAs in GenBank non-redundant databases using BLAST revealed that 4,957 of the 5,211 cDNAs (95.1%) showed a high degree of sequence similarity to genes from other organisms (Table 1). The remaining 5.9% (254 clones of 5,211) of the sequenced cDNAs did not meet the criterion required for a match (E-value cut-off at 10-5). It was clear that other methodologies, such as RNA blot analysis, linkage mapping or transformation into Arabidopsis, would be required to identify the functions of these clones and provide more information.

Expression profiles of leaf transcripts at the vegetative stage

Although relatively few clones (5,211) have actually been sequenced, diverse groups of genes have been identified in O. minuta. Approximately 73% of ESTs were assigned functions by alignment with Arabidopsis proteins, with E-values lower than 10-5 (Fig. 1). As expected, most of the identified transcripts appeared to be genes related to the metabolism and a subcellular localization (Fig. 1). The ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit (rbcS), which is the key enzyme of carbon assimilation, was the most frequently found gene (263 hits, Table 2). Several other genes related to energy and protein biosynthesis were also found in abundance (Fig. 1). These results are quite different from those of other plant EST projects in which different tissue materials were used (Crookshanks et al. 2001; Keith et al. 1993; Lim et al. 1996; Park et al. 1993; Sasaki et al. 1994; Uchimiya et al. 1992; Xiao et al. 1998), showing that the cells in vegetative leaves are metabolically active.

Fig. 1
figure 1

Functional classifications and comparative analysis of the ESTs of Oryza minuta, O. sativa and blast-treated O. sativa. The ESTs were classified on the basis of their biological functions by alignment to Arabidopsis protein sequences using an E-value cutoff of 10-5. For comparative analysis, 8,720 O. sativa immature leaf ESTs and 15,599 blast-treated O. sativa ESTs were retrieved from dbEST and grouped into functional categories. The Y-axis indicates percentage (%) of ESTs matched with Arabidopsis protein DB; the X-axis: 1 metabolism, 2 energy, 3 cell-cycle and DNA processing, 4 transcription, 5 protein synthesis, 6 protein fate, 7 cellular transport and transport mechanisms, 8 cellular communication/signal transduction, 9 cell rescue, defense and virulence, 10 regulation of interaction with cellular environment, 11 cell fate, 12 systemic regulation of interaction with environment, 13 development, 14 transposable element, viral and plasmid proteins, 15 control of cellular organization, 16 subcellular localization, 17 protein activity regulation, 18 protein with binding function or cofactor requirement, 19 storage protein, 20 transport facilitation

Table 2 Most prevalent mRNAs as determined by EST redundancy

As described in other projects (Covitz et al. 1998; Lee et al. 1998; Lim et al. 1996; Ok et al. 2000 Park et al. 1993; Sasaki et al. 1994), defense-related genes exhibited high expression (4.7%) in our experiment (Fig. 1). The fact that defense-related genes can be found in an EST project with normal tissue suggests that these genes may help plants maintain a normal physiological condition in response to several stresses. It should be pointed out that the metallothionein gene, which binds to toxic heavy metals and makes it easier for them to be excreted from the cell, was the second most frequent gene (103 hits, Table 2). The reason why this defense-related gene was observed at this level in O. minuta requires further examination.

In our study, 254 clones (about 5% of our EST data) did not match with previously reported GenBank database; hence, these ESTs were regarded as wild rice-specific novel transcripts (Table 3). While O. minuta may contain novel stress resistance genes that do not exist in O. sativa, this does not mean that all of the novel or new genes exist solely in the former—some were only expressed more highly in this wild rice species than in cultivated rice or even in other unrelated species. Therefore, in the near future, the blast-infected cDNA library of O. minuta should be analyzed for novel defense-related genes. The list of defense-related transcripts based on functional categorization is presented in Table 4.

Table 3 ESTs of 0. minuta-specific novel transcripts
Table 4 The putative identities of ESTs in functional categories of cell rescue, defense and virulence

Comparative analysis of leaf transcriptomes in wild (O. minuta) and cultivated rice (O. sativa)

The purpose of the EST analysis of O. minuta leaf was to find clues to the plant’s resistance mechanisms and to characterize novel genes involved in stress resistance based on evidence that wild rice has a better resistance to abiotic and biotic stresses than cultivated rice (Brar and Khush 1997). Consequently, we compared O. sativa leaf ESTs with those registered at the GenBank EST database. To identify the differences between the expression profiles of O. minuta and O. sativa, we collected the immature leaf ESTs of O. sativa (8,720 ESTs) and categorized these in 20 functional groups by sequence comparison with all Arabidopsis proteins using a cut-off P-value threshold of 10-5. ESTs of Magnaporthe grisea (rice blast)-infected O. sativa (15,599 ESTs) were also gathered for analysis. On comparing the expression profiles of these three transcriptomes (O. minuta, O. sativa and blast-treated O. sativa), we found that with respect to the metabolism and energy categories the gene expression levels of O. minuta and blast-infected O. sativa ESTs were considerably more elevated than that of normal O. sativa ESTs (Fig. 1). In terms of metabolism, the amino acid metabolism and nitrogen and sulfur metabolism sub-categories of O. minuta and blast-infected O. sativa exhibited elevated gene expression levels in comparison with those of normal O. sativa. Similarly, compared to O. sativa ESTs, O. minuta and blast-infected O. sativa ESTs showed higher gene expression levels in the sub-categories of photosynthesis, pentose-phosphate pathway, electron transport and membrane-associated energy conservation (Fig. 2A, B). In the case of protein synthesis, however, O. minuta showed a lower level (about one-half) of gene expression than that of O. sativa (Fig. 1), although O. minuta ESTs did show elevated levels in the sub-categories of translation, initiation, translational control, aminoacyl-tRNA-synthetases and other protein-synthesis activities, like those in blast-infected O. sativa (Fig. 2C). Based on these results, we suggest that the differences in gene expression between cultivated rice and their wild relatives rely on the regulation of gene expression, rather than novel sequences per se. Ongoing investigations on mutant and overexpression, as well as the promoter analyses will hopefully confirm this supposition.

Fig. 2A–C
figure 2

Sub-functional classifications of ESTs matched to the categories of metabolism, energy and protein synthesis. The Y-axis indicates the percentage (%) of ESTs with Arabidopsis protein database matches. A Metabolism category: 1 amino acid metabolism, 2 nitrogen and sulfur metabolism, 3 nucleotide metabolism, 4 phosphate metabolism, 5 C-compound and carbohydrate metabolism, 6 lipid, fatty acid and isoprenoid metabolism, 7 metabolism of vitamins, cofactors and prosthetic groups, 8 secondary metabolism. B Energy category: 1 glycolysis and gluconeogenesis, 2 Entner-Doudoroff pathway, 3 pentose-phosphate pathway, 4 tricarboxylic acid pathway, 5 electron transport and membrane-associated energy conservation, 6 respiration, 7 fermentation, 8 metabolism of energy reserves, 9 glyoxylate cycle, 10 oxidation of fatty acids, 11 photosynthesis, 12 other energy generation activities. C Protein synthesis category: 1 ribosome biogenesis, 2 translation, 3 initiation, 4 elongation, 5 translational control, 6 aminoacyl-tRNA-synthetase, 7 other protein-synthesis activities

It should be noted that 30 retrotransposons (30/5,211) were found in O. minuta ESTs, but only five (5/8,720) were found in O. sativa ESTs. Retrotransposons are known to be the most abundant and widely spread transposable elements in eukaryotes and are commonly found as multi-copies in plant genomes (Kumar and Bennetzen 1999). The majority of these transposable elements in plants are believed to be inactive and not transcribed, a few elements, such as barley BARE-1 (Suoniemi et al. 1998), tobacco Tnt1 (Grandbastien et al. 1989) and rice Tos17 (Hirochika et al. 1996) have been reported to be activated under conditions of biotic and abiotic stresses. Therefore, future refined research on these retrotransposons in O. minuta should provide information on the relationships between active transposable elements and stress resistance.

The ESTs described here are the first reported transcriptomes to be expressed at the vegetative leaf stage of wild rice. The gene expression profile of O. minuta was compared with that of cultivated rice, O. sativa. These genes and the different gene expression pattern can be used to unravel the regulatory networks of stress resistance in rice and possibly in other crops. The EST data provided also makes it feasible for molecular breeders to develop new varieties of cultivated rice with higher stress resistance.