Abstract
The expressed sequence tags (ESTs) presented in this report are the first transcriptomes of wild rice. A cDNA library was constructed from 4-week-old leaf samples of greenhouse-grown Oryza minuta. The 5,211 cDNA clones of O. minuta represent 3,401 unique sequences, consisting of 2,787 singletons and 614 assembled sequences. Database comparisons of the cDNAs in GenBank’s non-redundant databases using BLAST revealed that 4,957 of the 5,211 cDNAs (95.1%) showed a high degree of sequence homology to genes from other organisms. Most of the transcripts identified were genes related to metabolism, energy, protein biosynthesis and subcellular localization. The metabolism and energy categories of the O. minuta ESTs showed a considerably higher gene expression level than those of O. sativa ESTs. These data and genes can be utilized in rice breeding.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Cultivated rice (Oryza sativa L.) is the world’s most important food crop, and more than 90% of the world’s rice production and consumption occurs in Asia. One-third of the world’s population depends on rice as a primary food source (Khush 1997). However, the productivity of rice is continuously being threatened by various unfavorable environmental factors, such as salinity, drought and growing temperature, and by many kinds of pathogens.
Wild crop relatives are an important source of existing genetic variations, including resistance to various stresses (Vaughan 1994; Xiao et al. 1998). In the case of rice, wild relatives have diversified over a wide range of environments over 40 million years and have become major gene sources for various valuable economic characteristics that are not easily available in the cultivated germplasm (Khush 1997). The Oryza genus comprises 23 species and nine recognized genome types (AA, BB, BBCC, CC, CCDD, EE, FF, GG and HHJJ). These genomes are distantly related to O. sativa and could provide beneficial genes for cultivated rice (Ge et al. 1998; Vaughan 1994). In the case of O. rufipogon (AA genome), in spite of its overall inferior appearance, it is known to contain valuable traits with respect to salinity tolerance and cytoplasmic male sterility, and useful quantitative trait loci (QTL) with agronomically important traits. O. officinalis (CC genome) has also been reported to show resistance to vermin, such as the yellow stem borer, planthopper and leafhopper. These results suggest that the wild-type germplasms may provide new solutions to improve rice productivity, which may also be applicable to other crops (Brar et al. 1991; Brar and Khush 1997; Vaughan 1994; Xiao et al. 1996, 1998).
Expressed sequence tags (ESTs), which is the single-pass sequencing of randomly chosen clones from the cDNA library, are molecular tools that are more than adequate in defining an expressed gene and reflecting transcript abundance. The global and comprehensive analyses of many species have been made possible by large-scale EST projects (Adams et al. 1991). Large-scale EST databases provide a great deal of information on the complexities of gene expression patterns, the functions of transcripts and the development of single nucleotide polymorphisms (SNPs) (Breyne and Zabwau 2001; Michalek 2002; Wu et al. 2002). In the plant kingdom, large-scale EST databases have been accumulated for model plants and crops—for example, Arabidopsis thaliana, O. sativa, Zea mays, Brassica napus, B. campestris and Medicago truncatula. In addition, various ESTs from diverse tissues, developmental stages, and stress-treated cDNA libraries have been compared and reported (Carson et al. 2002; Chen et al. 2002; Covitz et al. 1998; Höfte et al. 1993; Keith et al. 1993; Lim et al. 1996; Ok et al. 2000; Newman et al. 1994; Park et al. 1993; Qutob et al. 2000; Sasaki et al. 1994; Uchimiya et al. 1992; Ujino-Ihara et al. 2000; Wu et al. 2002). In the cases of A. thaliana and rice, full genome and draft sequences were reported recently (Goff et al. 2002; The Arabidopsis Genome Initiative 2000; Yu et al. 2002).
We describe here, for the first time, the expression patterns of novel genes expressed under normal growth conditions in a wild species of the Oryza genus. O. minuta, a wild relative of rice, contains the BBCC genome and has been used as a donor of resistance to blast and bacterial blight. Because of the potential importance of wild rice species in rice breeding, rice breeders have recently been paying them a great deal of attention (Vaughan 1994). We report on the partial sequences, database comparisons and functional categorization of 5,211 randomly collected vegetative-stage leaf cDNA clones of O. minuta (BBCC) based on the classification of the Munich Information Center for Protein Sequences (MIPS) for Arabidopsis thaliana. These data will be useful for those searching for novel genes expressed at the vegetative leaf stage and will contribute to the connection between EST databases designed to elucidate the gene expression profiles of plants, especially those of the Poaceae family.
Materials and methods
Plant material and construction of the cDNA library
Four-week-old leaf samples of glasshouse-grown Oryza minuta (accession no. 101144) were used for this study. Total RNA was isolated from leaves using TRIZOL reagent (Gibco/BRL, Gaithersburg, Md.) according to the manufacturer’s instruction. The amount and quality of total RNA was checked by spectrophotometry (OD260/280) and a formaldehyde-1% agarose gel electrophoresis system. Poly (A)+ RNA was extracted from total RNA using the PolyATtract mRNA isolation system (Promega, Madison, Wis.) according to the manufacturer’s protocol. A HybriZAP-2.1XR library construction kit and a HybriZAP-2.1 XR cDNA synthesis kit (Stratagene, La Jolla, Calif.) were used to construct the cDNA library at Eugentech (Korea). The library was packaged into Gigapack III Gold packaging extract; lambda ZAP yielded 6×106 primary plaques, which were then amplified to a titer of 3×1010 pfu/ml. cDNA-inserted pAD-GAL4-2.1 phagemid vectors were excised by mass in vivo excision using an ExAssist helper phage system (Stratagene). The titer of the resulting library was as 1.67×108 cfu/ml, and phagemids were used to infect Escherichia coli strain XLOLR according to the manufacturer’s instructions.
Nucleotide sequencing and sequence data analysis
A total of 5,760 randomly collected clones were sequenced at Green Gene BioTech (Korea) from the 5′ ends using the 5′ AD primer (5-AGGGATGTTTAATACCACTAC-3′) of the pAD-GAL4-2.1 phagemid vectors. Ambiguous sequences of the 5′ and 3′ ends were removed, and vector sequences were trimmed automatically using a custom Python script. This script linked sequence backup, basecalling by Phred (trimming option on, cut-off set to 0.05; Green Gene BioTech). Sequences smaller than 200 bp or with more than 5% ambiguity were excluded. blastx searches and putative identifications were carried out automatically by Python script and the web blast program. Contigs were constructed with the edited sequences using CAP (Contig Assembly Program). All EST data are publicly available through the National Center for Biotechnology Information (NCBI, USA; GenBank dbEST accession nos. CB209721–CB214919).
Comparative analysis of the EST sequences
The MIPS functional categories applied to Arabidopsis genes were used for O. minuta. Translated O. minuta ESTs were categorized into 20 functional groups and an unclear classified group by sequence comparison with all Arabidopsis proteins using a P-value cut-off threshold of 10-5. All Arabidopsis protein sequences were downloaded from the ftp site of MIPS and transformed to blast searchable data by the formatdb program (NCBI). Functional redundancies of Arabidopsis proteins were allowed in the classification.
Results and discussion
Characterization of the cDNA library and EST sequences
A cDNA library was constructed using mRNA isolated from Oryza minuta leaves. The primary library contained 6×106 recombinant phages, and after plaque amplification, the serial titers of the amplified plaques showed that the library contained approximately 3×1010 pfu/ml of SM buffer, which was considered to adequately represent gene expression. After the removal of vector sequences and ambiguous short sequences (<200 bp) from the 5’ end sequences of 5,760 O. minuta cDNA clones, 5,211 clones revealed meaningful sequences (Table 1). These 5,211 O. minuta cDNAs produced a total of 3,401 unique sequences, which consist of 2,787 singletons and 614 assembled sequences. Redundancy (ESTs assembled in clusters/total ESTs) of an mRNA indicates the abundance of its corresponding cDNA in non-normalized libraries. In other words, information on randomly picked cDNAs represents the relative expression levels of the genes in a library. In this project, among than 5,211 total cDNAs, 2,424 genes were assembled in 614 clusters, indicating a redundancy of 46.5%.
Database comparisons of cDNAs in GenBank non-redundant databases using BLAST revealed that 4,957 of the 5,211 cDNAs (95.1%) showed a high degree of sequence similarity to genes from other organisms (Table 1). The remaining 5.9% (254 clones of 5,211) of the sequenced cDNAs did not meet the criterion required for a match (E-value cut-off at 10-5). It was clear that other methodologies, such as RNA blot analysis, linkage mapping or transformation into Arabidopsis, would be required to identify the functions of these clones and provide more information.
Expression profiles of leaf transcripts at the vegetative stage
Although relatively few clones (5,211) have actually been sequenced, diverse groups of genes have been identified in O. minuta. Approximately 73% of ESTs were assigned functions by alignment with Arabidopsis proteins, with E-values lower than 10-5 (Fig. 1). As expected, most of the identified transcripts appeared to be genes related to the metabolism and a subcellular localization (Fig. 1). The ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit (rbcS), which is the key enzyme of carbon assimilation, was the most frequently found gene (263 hits, Table 2). Several other genes related to energy and protein biosynthesis were also found in abundance (Fig. 1). These results are quite different from those of other plant EST projects in which different tissue materials were used (Crookshanks et al. 2001; Keith et al. 1993; Lim et al. 1996; Park et al. 1993; Sasaki et al. 1994; Uchimiya et al. 1992; Xiao et al. 1998), showing that the cells in vegetative leaves are metabolically active.
As described in other projects (Covitz et al. 1998; Lee et al. 1998; Lim et al. 1996; Ok et al. 2000 Park et al. 1993; Sasaki et al. 1994), defense-related genes exhibited high expression (4.7%) in our experiment (Fig. 1). The fact that defense-related genes can be found in an EST project with normal tissue suggests that these genes may help plants maintain a normal physiological condition in response to several stresses. It should be pointed out that the metallothionein gene, which binds to toxic heavy metals and makes it easier for them to be excreted from the cell, was the second most frequent gene (103 hits, Table 2). The reason why this defense-related gene was observed at this level in O. minuta requires further examination.
In our study, 254 clones (about 5% of our EST data) did not match with previously reported GenBank database; hence, these ESTs were regarded as wild rice-specific novel transcripts (Table 3). While O. minuta may contain novel stress resistance genes that do not exist in O. sativa, this does not mean that all of the novel or new genes exist solely in the former—some were only expressed more highly in this wild rice species than in cultivated rice or even in other unrelated species. Therefore, in the near future, the blast-infected cDNA library of O. minuta should be analyzed for novel defense-related genes. The list of defense-related transcripts based on functional categorization is presented in Table 4.
Comparative analysis of leaf transcriptomes in wild (O. minuta) and cultivated rice (O. sativa)
The purpose of the EST analysis of O. minuta leaf was to find clues to the plant’s resistance mechanisms and to characterize novel genes involved in stress resistance based on evidence that wild rice has a better resistance to abiotic and biotic stresses than cultivated rice (Brar and Khush 1997). Consequently, we compared O. sativa leaf ESTs with those registered at the GenBank EST database. To identify the differences between the expression profiles of O. minuta and O. sativa, we collected the immature leaf ESTs of O. sativa (8,720 ESTs) and categorized these in 20 functional groups by sequence comparison with all Arabidopsis proteins using a cut-off P-value threshold of 10-5. ESTs of Magnaporthe grisea (rice blast)-infected O. sativa (15,599 ESTs) were also gathered for analysis. On comparing the expression profiles of these three transcriptomes (O. minuta, O. sativa and blast-treated O. sativa), we found that with respect to the metabolism and energy categories the gene expression levels of O. minuta and blast-infected O. sativa ESTs were considerably more elevated than that of normal O. sativa ESTs (Fig. 1). In terms of metabolism, the amino acid metabolism and nitrogen and sulfur metabolism sub-categories of O. minuta and blast-infected O. sativa exhibited elevated gene expression levels in comparison with those of normal O. sativa. Similarly, compared to O. sativa ESTs, O. minuta and blast-infected O. sativa ESTs showed higher gene expression levels in the sub-categories of photosynthesis, pentose-phosphate pathway, electron transport and membrane-associated energy conservation (Fig. 2A, B). In the case of protein synthesis, however, O. minuta showed a lower level (about one-half) of gene expression than that of O. sativa (Fig. 1), although O. minuta ESTs did show elevated levels in the sub-categories of translation, initiation, translational control, aminoacyl-tRNA-synthetases and other protein-synthesis activities, like those in blast-infected O. sativa (Fig. 2C). Based on these results, we suggest that the differences in gene expression between cultivated rice and their wild relatives rely on the regulation of gene expression, rather than novel sequences per se. Ongoing investigations on mutant and overexpression, as well as the promoter analyses will hopefully confirm this supposition.
It should be noted that 30 retrotransposons (30/5,211) were found in O. minuta ESTs, but only five (5/8,720) were found in O. sativa ESTs. Retrotransposons are known to be the most abundant and widely spread transposable elements in eukaryotes and are commonly found as multi-copies in plant genomes (Kumar and Bennetzen 1999). The majority of these transposable elements in plants are believed to be inactive and not transcribed, a few elements, such as barley BARE-1 (Suoniemi et al. 1998), tobacco Tnt1 (Grandbastien et al. 1989) and rice Tos17 (Hirochika et al. 1996) have been reported to be activated under conditions of biotic and abiotic stresses. Therefore, future refined research on these retrotransposons in O. minuta should provide information on the relationships between active transposable elements and stress resistance.
The ESTs described here are the first reported transcriptomes to be expressed at the vegetative leaf stage of wild rice. The gene expression profile of O. minuta was compared with that of cultivated rice, O. sativa. These genes and the different gene expression pattern can be used to unravel the regulatory networks of stress resistance in rice and possibly in other crops. The EST data provided also makes it feasible for molecular breeders to develop new varieties of cultivated rice with higher stress resistance.
References
Adams MD, Kelly JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Meril CR, Wu A, Olde B, Moreno RF, Kerlavage AR, McCombie WR, Ventor JC (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656
Brar DS, Khush GS (1997) Alien introgression in rice. Plant Mol Biol 35:35–47
Brar DS, Elloran R, Khush GS (1991) Interspecific hybrids produced through embryo rescue between cultivated and eight wild species of rice. Rice Genet Newsl 8:91–93
Breyne P, Zabwau M (2001) Genome-wide expression analysis of plant cell cycle modulated genes. Curr Opin Plant Biol 4:136–142
Carson DL, Huckett BI, Botha FC (2002) Sugarcane ESTs differentially expressed in immature and maturing internodal tissue. Plant Sci 162:289–300
Chen M et al. (2002) An integrated physical and genetic map of the rice genome. Plant Cell 14:537–545
Covitz PA, Smith LS, Long SR (1998) Expressed sequence tags from a root-hair-enriched Medicago truncatula cDNA library. Plant Physiol 117:1325–1332
Crookshanks M, Emmersen J, Welinder KG, Nielsen KL (2001) The potato tuber transcriptome: analysis of 6,077 expressed sequence tags. FEBS Lett 506:123–126
Ge S, Sang T, Lu BR, Hong DY (1999) Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc Natl Acad Sci USA 96:14400–14405
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S. Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gurin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Reldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome. Science 296:92–100
Grandbastien MA, Spielmann A, Caboche M (1989) Tnt1, a mobile retroviral-like transposable elements of tobacco isolated by plant cell genetics. Nature 337:376–380
Hirochika H, Sugimoto K, Otsuki Y, Tsugawa H, Kanda M (1996) Silencing of retrotransposons in Arabidopsis and reactivation by the ddm1 mutation. Proc Natl Acad Sci USA 23:7783–7788
Höfte H, Desprez T, Amselem J, Chiapello H, Caboche M, Moisan A, Jourjon M, Charpenteau J, Berthomieu P, Guerrier D, Giraudat J, Quigley F, Thomas F, Yu D, Mache R, Raynal M, Cooke R, Grellet F, Delseny M, Parmentier Y, Marcillac G, Gigot C, Fleck J, Philipps G, Axelos M, Bardet C, Tremousaygue D, Lescure B (1993) An inventory of 1,152 expressed sequence tags obtained by partial sequencing of cDNAs from Arabidopsis thaliana. Plant J 4:1051–1051
Keith CS, Hoang DO, Barrett BM, Feigelman B, Nelson MC, Thai H, Baysdorfer C (1993) Partial sequence analysis of 130 randomly selected maize cDNA clones. Plant Physiol 101:329–332
Khush GS (1997) Origin, dispersal, cultivation and variation of rice. Plant Mol Biol 35:25–34
Kumar A, Bennetzen JL (1999) Plant retrotransposons. Annu Rev Genet 33:479–532
Lee CM, Lee YJ, Lee MH, Nam HG, Cho TJ, Hahn TR, Cho MJ, Sohn U (1998) Large-scale analysis of expressed genes from the leaf of oilseed rape (Brassica napus L.). Plant Cell Rep 17:930–936
Lim CO, Kim HY, Kim MG, Lee SI, Chung WS, Park SH, Hwang I, Cho MJ (1996) Expressed sequence tags of Chinese cabbage flower bud cDNA. Plant Physiol 111:577–588
Michalek W, Weschke W, Pleissner KP, Graner A (2002) EST analysis in barley defines a unigene set comprising 4,000 genes. Theor Appl Genet 104:97–103
Newman T, de Bruijn FJ, Green P, Keegstra K, Kende H, McIntosh L, Ohlrogge J, Raikhel N, Somerville S, Thomashow M, Retzel E, Somerville C (1994) Genes galore: a summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones. Plant Physiol 106:1241–1255
Ok SH, Chung YS, Um BY, Park MS, Bae JM, Lee SJ, Shin JS (2000) Identification of expressed sequence tags of watermelon (Citrullus lanatus) leaf at the vegetative stage. Plant Cell Rep 19:932–937
Park YS, Kwak JM, Kwon OY, Kim YS, Lee DS, Cho MJ, Lee HH, Nam HG (1993) Generation of expressed sequence tags of random root cDNA clones of Brassica napus by single-run partial sequencing. Plant Physiol 103:359–370
Qutob D, Hraber PT, Sobral BWS, Gijzen M (2000) Comparative analysis of expressed sequences in Phytophthora sojae. Plant Physiol 123:243–253
Sasaki T, Song J, Koga-Ban Y, Matsui E, Fang F, Higo H, Nagasaki H, Hori M, Miya M, Murayama-Kayano E, Takiguchi T, Tasasuga A, Niki T, Ishimaru K, Ikeda H, Yamamoto Y, Mukai Y, Ohta I, Miyadera N, Havukkala I, Minobe Y (1994) Toward cataloguing all rice genes: large-scale sequencing of randomly chosen rice cDNAs from a callus cDNA library. Plant J 6:615–624
Suoniemi A, Tanskanen J, Schulman AH (1998) Gypsy-like retrotransposons are widespread in the plant kingdom. Plant J 13:699–705
The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815
Uchimiya Y, Lidou S, Shimazaki T, Takamatsu S, Hashimoto H, Nishi R, Aotsuka S, Matsubayashi Y, Kidou N, Umeda M, Kato A (1992) Random sequencing of cDNA libraries reveals a variety of expressed genes in cultured cells of rice (Oryza sativa L.). Plant J 2:1005–1009
Ujino-Ihara T, Yoshimura K, Ugawa Y, Yoshimura H, Nagasaka K, Tsumura Y (2000) Expression analysis of ESTs from the inner bark of Cryptomeria japonica. Plant Mol Biol 43:451–457
Vaughan DA (1994) The wild relatives of rice—a genetic resources handbook. IRRI, Philippines
Wu J, Maehara T, Shimokawa T, Yamamoto S, Harada C, Takazaki Y, Ono N, Mukai Y, Koike K, Yasaki J, Fujii F, Shomura A, Ando T, Kono I, Waki K, Yamamoto K, Yano M, matsumoto T, Sasaki T (2002) A comprehensive rice transcript map containing 6591 expressed sequence tag sites. Plant Cell 14:525–535
Xiao J, Grandillo S, Ahn SN, McCouch SR, Tanksley SD, Li J, Yuan L (1996) Genes from wild rice improved yield. Nature 384:223–224
Xiao J, Li J, Grandillo S, Ahn SN, Yuan L, Tanksley SD, McCouch SR (1998) Genes from wild rice improved yield. Genetics 150:899–909
Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, He S, Zhang J, Xu H, Ahang K, Aheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, H;ao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92
Acknowledgements
This project was funded by a grant from the Crop Functional Genomics Center of the 21st Century Frontier Research Program funded by the Ministry of Science and Technology and by a grant from the BioGreen21 Program of Rural Development Administration.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by I.S. Chung
S.K. Cho and S.H. Ok contributed equally to the investigation reported here.
All EST data are publicly available through the National Center for Biotechnology Information (NCBI, USA; GenBank dbEST Accession No. CB209721~CB214919).
Rights and permissions
About this article
Cite this article
Cho, S.K., Ok, S.H., Jeung, J.U. et al. Comparative analysis of 5,211 leaf ESTs of wild rice (Oryza minuta). Plant Cell Rep 22, 839–847 (2004). https://doi.org/10.1007/s00299-004-0764-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00299-004-0764-4