Introduction

Drought is one of the major abiotic stresses limiting crop productivity worldwide. As water resources and arable land become limiting, the development of drought tolerant crops and use of marginal lands for agriculture will become increasingly important. Cassava (Manihot esculenta Crantz), also known as manioc, is an important food crop in the tropics and subtropics where it is mainly grown for its starchy tuberous roots. Cassava ranks fourth as the most important source of food globally, after rice, sugarcane, and maize, and is also utilized as animal feed and for the industrial production of non-grain starch and ethanol (Balagopalan 2002). Cassava can be cultivated under conditions considered marginal for most other crops, including low fertility soils and areas that face sporadic or seasonal drought (Sakai et al. 1994).

In plants, drought triggers a wide variety of responses including alterations in gene expression, the synthesis of specific proteins (e.g., proteins that scavenge oxygen radicals, chaperone proteins, etc.), and the accumulation of metabolites or osmotically active compounds. The phytohormone abscisic acid (ABA) plays a key role in mediating responses to abiotic stress and promotes characteristic developmental changes that help plants cope with water deficit, such as the restriction of shoot growth and leaf area expansion (Lecoeur et al. 1995), the stimulation of root extension (Sharp et al. 1994), the accumulation of osmotically active solutes (Larosa et al. 1987), and the closing of stomata. Physiologically, tolerance to drought is a complex phenomenon involving drought escape, dehydration avoidance, dehydration tolerance, and desiccation tolerance mechanisms (Blum 1998). Genetically, tolerance to dehydration stress is a multifactorial trait, which makes breeding for drought tolerance arduous. Breeding plant varieties for sustainable production under moisture stress is also challenging because field trial climatic factors, such as temperature and drought, are often unpredictable (Blum 1998). Low fertility and the highly heterozygous nature of cassava render traditional breeding approaches difficult and make it a prime candidate for genomics-assisted breeding or genetic engineering.

In the past decade, excellent progress has been made in unraveling abiotic stress pathways at the molecular level in plants. This knowledge has been applied to the production of plants tolerant to drought, cold, and salt through genetic transformation (Zhang et al. 2004; Umezawa et al. 2006). Considerable progress has also been made in the genetic mapping of abiotic stress traits in plants, especially the cereals, for marker- or genomics-assisted breeding. For example, various genetic effects regulating drought, salt, and cold stress have been assembled on a single chromosome map in the Triticeae (Cattivelli et al. 2002).

Cassava can withstand short dry spells as well as prolonged periods of drought up to 4–6 months. Cassava responds to drought episodes mainly through dehydration avoidance by rapidly closing stomata to reduce transpiration and maintain high water potential (El Sharkawy 2004). The extent of osmotic adjustment and accumulation of osmoprotective proteins such as dehydrins, appears small (Alves and Setter 2004a, b). Leaf growth and photosynthesis are decreased to near zero during episodes of water deficit but recover rapidly after rainfall. Thus, cassava responds to drought by arresting photosynthesis and growth, which are resumed after the recovery of water status. This behavior is appropriate for environments that face periodic water shortages or prolonged drought and contrasts with that of other drought tolerant plant species, such as sorghum, which respond to drought by partially closing stomata while adjusting osmotically and maintaining modest rates of photosynthesis. Drought avoidance mechanisms in cassava include root characteristics, such as early bulking and deep rooting (El Sharkawy 2004). In a comparative study of nine improved cassava varieties, Okogbenin et al. (2003) found that the adaptation response to drought stress was significantly influenced by genotype, suggesting a strong genetic basis for drought tolerance in cassava.

One approach for analyzing changes in gene expression under stress conditions is to compare expressed sequence tags (ESTs) from normal and stressed tissues. EST sequences provide a robust approximation of the expressed gene content of the parental genome under given sampling conditions (Satou et al. 2003; Alba et al. 2004). For example, several genes responsive to dehydration stress were identified from ESTs generated from ABA-treated and desiccated moss cDNA libraries (Machuka et al. 1999; Wood et al. 1999).

To reduce redundancies in EST collections and enrich for low-abundance transcripts, a cDNA library normalization approach can be followed (Bonaldo et al. 1996). Indeed, ESTs generated from normalized cDNA libraries have been used for the discovery of novel genes and have provided comprehensive analyses of genes expressed in Arabidopsis (Asamizu et al. 2000a), Lotus japonicus (Asamizu et al. 2000b), and Triticum aestivum (Ali et al. 2000).

Expressed sequence tags can be used for the production of microarrays, which are rapidly becoming a preferred tool to identify and dissect complex genetic networks that underlie physiological and developmental processes (Richmond and Somerville 2000). Gene-derived molecular markers are an inexpensive byproduct of EST datasets (Kota et al. 2003; Thiel et al. 2003).

Although drought tolerance in cassava is genotype-dependent, the genetic basis underlying drought tolerance in cassava is unknown at present. Also, genomics or molecular approaches to investigate the response of cassava to drought have not been reported. We intended to develop a genomics resource for cassava as a tool for gene discovery, genetic studies and expression profiling, specifically targeting dehydration stress.

Materials and methods

Plant material and growth conditions

In vitro plantlets of the cassava accession TME117 (local name Isunikankiyan, Nigeria) were derived from meristem cultures at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria, and shipped to the USDA Biosciences Research Laboratory in Fargo, ND, USA. Plants were propagated and maintained as previously described (Anderson et al. 2004). One year after planting, dehydration stress on the plants was initiated by withholding water for a period of 6–14 days. Control plants were regularly watered. Leaf, stem, petiole, and meristem tissue was obtained from plants grown in 11″ wide × 9.5″ tall pots filled with a mixture of one part sunshine mix #1 and 2 parts sandy loam. Collection of leaf, stem, petiole, and meristem tissue subjected to dehydration stress occurred 6 days after withholding water, at which point the mature leaves showed visible wilting. Storage root with peel, and root tissue was obtained from plants grown in 22.75″ wide × 18.25″ tall pots filled with a mixture of one part sunshine mix #1 and 2 parts sandy loam. Collection of storage root with peel, and root tissue occurred 14 days after withholding water, at which point the mature leaves showed visible wilting. Tissues were collected from plants grown in three separate pots. All tissue samples were individually ground in liquid N2 prior to storage at −80°C.

cDNA library construction and normalization

Total RNA was extracted as described by Anderson and Horvath (2001) and then pooled in equal amounts before isolation of mRNA. Poly(A)+mRNA was isolated twice from total RNA of each tissue sample using the Oligotex Direct mRNA kit (Qiagen, Valencia, CA, USA). Reverse transcription of mRNA into double-stranded cDNA was accomplished using the SuperScriptTM Choice System modified with NotI/oligo(dT)18 primers with an identifying 5-bp tag embedded between the NotI cloning site and the oligo(dT)18. The modified oligo(dT)18 primers were as follows: (NotI)TCCGA(dT)18 for the control mRNA and (NotI)TCGCA(dT)18 for the dehydration-stressed mRNA. EcoRI adapters were ligated onto the blunt-ended, double-stranded cDNAs (more than 450 bp). After digestion with NotI, the cDNAs were directionally cloned into EcoRI–NotI digested pBSII SK(+) phagemid vector (Stratagene, La Jolla, CA, USA). Purified plasmid DNA from the primary libraries was converted to single-stranded circles and used as a template for PCR amplification using the T7 and T3 priming sites that flank the cloned cDNA inserts. The purified PCR products, representing the entire cloned cDNA population, were used as a driver for normalization. Hybridization between the single-stranded library and the PCR products was carried out for 44 h at 30°C. Unhybridized single-stranded DNA circles were separated from hybridized DNA, rendered partially double-stranded and electroporated into DH10B cells to generate the normalized library.

Nucleotide sequencing and sequence data analysis

Randomly selected clones were partially sequenced from the 5′ end on an ABI 3730. Base calling was done using PhredPhrap (Ewing et al. 1998). Assessment of high quality sequence and calculation of plate-wise success rate was done using QualTrim (bases with score 20 and above were considered high quality). Vector sequences and bases having Phred scores lower than 20 were removed. Clean sequences with a length of 200 or more bases after trimming were masked of repeat and low complexity sequences using RepeatMasker. The final clean sequences were then screened for unwanted sequences such as Escherichia coli genome, vector, mitochondrial DNA, ribosomal RNA, and viral DNA using BLASTN. All EST data are publicly available through the National Center for Biotechnology Information (NCBI, Bethesda, MD, USA; GenBank dbEST accession nos. DV440840—DV459005).

Annotation of filtered EST sequences

Expressed sequence tag sequence collections were annotated within the Sputnik framework for comparative plant genomics as described (Rudd 2005). Briefly, ESTs were clustered using the Hashed Position Tree 2 algorithm (Biomax Informatics, Martinsried, Germany) and the clusters assembled into unigenes using the CAP3 algorithm. Unigene sequences were assigned a unique id. Peptide sequences were derived for all unigenes using the ESTScan application (Iseli et al. 1999). Prior to ESTScan prediction, a M. esculenta species-specific ESTScan model was created by training with open reading frames identified through BLASTX of the unigenes against the SWISS-PROT database with the results filtered using the expectation value of 1e-10. The sequences were further annotated for structural and functional attributes using InterPro domains (Mulder et al. 2003), and the complete sequence collections were summarized using the MIPS catalogue of functionally annotated proteins (Funcat) and Gene Ontology terms (Mewes et al. 2002; Harris et al. 2004).

Statistical analysis

The data for Figs. 2 and 3 was analyzed using a t-test to determine whether there were significant differences in terms of number of ESTs in the various functional categories for the dehydration stress treatment versus control, well-watered treatment. The Statistical Analysis System Package (SAS Institute Version 9.1 2004) was used.

Marker prediction

Each unigene EST set was searched for simple sequence repeat (SSR) markers as described (Rudd et al. 2005). SSR markers were identified by scoring for repeats. If repeats at the same locus and of different length were found, the candidate SSR was labeled as a probable SSR.

Results

Characterization of the cDNA libraries and EST sequences

Two normalized cDNA libraries were constructed using mRNA isolated from dehydration-stressed and control well-watered cassava tissues. The control library contained 5 × 105 recombinant colony forming units (cfu) and the dehydration-stressed library contained 1 × 106 cfu. Inserts ranged from 0.5 to 2.5 kb. In total, 8,956 randomly collected, high quality ESTs were generated from the normalized control cDNA library and 9,210 from the normalized dehydration-stressed library, yielding a total of 18,166 ESTs with an average read length of 586 nucleotides (Table 1). These 18,166 cDNAs produced a total of 8,577 unigenes, which consisted of 5,383 singletons and 3,194 multi-member clusters with an average size of 4.05.

Table 1 Summary report for cassava ESTs

The unigene sequences were annotated for structural and functional characteristics using a selection of bioinformatics tools that are relevant to comparative genomics and biological understanding as described (Rudd 2005). Approximately 63% of the unigenes were assigned functions (Fig. 1). The largest category (25.71%) contained EST sequences with no similarities to previously sequenced genes, which indicates the presence of putative novel genes that are reported here for the first time and may be specific to cassava. This was followed by sequences with unclear function (11.24%). Most of the identified transcripts appeared to be from genes related to metabolism (10.19%) and cellular organization (8.69%).

Fig. 1
figure 1

Functional classification of 18,166 cassava ESTs from the combined dehydration-stressed and control cDNA libraries

A comparison between the functional categories of the dehydration-stressed and the control library is shown (Fig. 2). Overall, the numbers of unigenes in the different functional groups were similar for the two libraries. A further breakdown of the ESTs in the ‘cell rescue, defense, cell death, and ageing’ category is presented (Fig. 3). A t-test was performed on the data in Figs. 2 and 3 and showed that no significant differences were found in terms of number of ESTs in the different functional categories between the two treatments (t = 0.09 for the data in Fig. 2 and t = 0.22 for the data in Fig. 3).

Fig. 2
figure 2

Comparison of functional classification between unigene sequences of the dehydration-stressed and control cDNA library

Fig. 3
figure 3

Comparison of unigene sequences in the ‘cell rescue, defense, death, and ageing’ functional category for the dehydration-stressed and control cDNA library

Identification of putative dehydration stress-responsive genes unique to dehydration-stressed tissues

The EST profiles of the transcriptome from the dehydration and control libraries may provide clues for the identification of dehydration stress-responsive genes. To assess whether the dehydration stress treatment was successful in enrichment of drought responsive genes, EST that were unique to the dehydration stressed library were identified in silico for the functional category ‘cell rescue, defense, cell death, and ageing’. These are listed in Table 2. The type member of the EST cluster is indicated in this table as well as the EST copy number within each cluster. As shown, the dehydration-stressed library contained numerous unique ESTs that encode proteins with recognized roles in drought responses. Examples include the precursor to Early Responsive to Dehydration ERD1 (Kiyosue et al. 1994), the cysteine proteinase Response to Dehydration RD19 (Yamagushi-Shinozaki and Shinozaki 1993), dehydration-responsive protein RD22 precursor, drought-induced protein Di19-like (Gosti et al. 1995) and various high and low molecular weight heat-shock proteins (HSPs) (Vierling 1991). Transcription factors that were unique to dehydration-stressed tissues included a heat-stress transcription factor A3, heat-shock transcription factor 21, and a homeobox-leucine zipper protein ATHB12, known to mediate growth response to water deficit (Olsson et al. 2004). Molecular chaperonins unique to the dehydration-stressed tissue were DnaK- and DnaJ-like proteins. DnaJ protein is involved as a cochaperonin in the function of HSP70s in protein folding and stabilization (Netzer and Hartl 1998). Transcripts implicated in oxidative stress responses and/or protection of cellular membranes that were unique to the dehydration-stressed library included monodehydroascorbate reductase and phospholipid hydroperoxide glutathione peroxidase. Other dehydration-stressed unique transcripts included phospholipase D and phosphoinositide-specific phospholipase C (PI-PLC). Phospholipid signaling has recently been implicated in the responses of plants to various environmental cues, both biotic and abiotic (Meijer and Munnik 2003). Phospholipase D catalyzes the hydrolysis of structural phospholipids and produces phosphatidic acid, which acts as a second messenger in signal transduction pathways. PI-PLC is involved in the synthesis of inositol 1,4,5-triphosphate, which stimulates the release of Ca2+ from intracellular stores and as such is also involved in signal transduction. In Arabidopsis, a transcript encoding PI-PLC was significantly induced under various environmental stresses, such as dehydration, salinity, and low temperature (Hirayama et al. 1995). A calcineurin B-like (CBL) protein was also detected in our dehydration-stressed library (Kudla et al. 1999). CBL proteins are Ca2+ sensors that regulate the activity of a specific group of kinases. In Arabidopsis, CBL1 is a critical calcium sensor in abiotic stress responses (Cheong et al. 2003). Overall, at least 28 of the 63 cassava EST clusters in Table 2 showed significant homology to known drought responsive genes in other plant species, illustrating that the dehydration stress treatment was effective for enriching ESTs involved in drought responses.

Table 2 The putative identity of ESTs unique to the dehydration-stressed library

In addition, several transcripts induced by other environmental stresses or by biotic stresses were also unique to the dehydration library. These included ESTs encoding a salt-inducible protein homolog, a cold acclimation protein, a stress-induced STI1-like protein, various bleu copper-binding (BCB) and germin-like proteins (GLP) (Zimmerman et al. 2006), and the disease resistance RPP5-like protein.

Digital Northerns

Gorantla et al. (2007) recently examined expression profiles of highly expressed genes through digital Northerns of a normalized EST library constructed from drought-stressed seedlings to identify putative stress-responsive genes in rice. Here, highly prevalent ESTs clusters with ten or more copies in the dehydration stressed library were considered to detect genes that are potentially involved in dehydration stress. This analysis was conducted for the functional category ‘cell rescue, defense, cell death, and ageing’ and Table 3 lists EST clusters that have at least ten copies in the dehydration stressed library. The EST copy number in the control library is also indicated. As shown, all highly prevalent ESTs in the dehydration stressed library are also represented in the control library. Metallothionein (MT)-like genes were the most abundant class in the combined normalized libraries (120 total hits for MT-1 and 116 total hits for MT-3). MT-like proteins are involved in metal detoxification and are efficient scavengers of free hydroxyl radicals (Palmiter 1998). In rice, MT-like proteins were found to be the most abundant EST class in drought-stressed leaf tissue (Reddy et al. 2002) but they were also abundant in non-stressed leaf tissue (Cho et al. 2004) which is in agreement with our observations. ESTs encoding the ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit and the chlorophyll a/b binding protein, key enzymes in photosynthesis, were also found in abundance. With 36 copies, the late-embryogenesis-abundant (LEA) group 5 LEA transcript was among the most abundant EST in the dehydration-stressed library. ESTs encoding a lipid-transfer protein (LTP) were also highly prevalent in the dehydration-stressed library with 27 copies. The LEA proteins are a group of proteins commonly involved in the enhancement of stress tolerance with suggested roles in binding water, protein or membrane stabilization, and in ion sequestration (Chaves and Oliveira 2004). LTPs are ubiquitous in higher plants. A number of functions have been proposed for plant LTPs, including involvement in cuticle biosynthesis, surface wax production, or adaptation to environmental stresses (Kader 1997; Treviño and O’Connell 1998). ESTs encoding the Response to Dehydration protein, RD22 (Yamagushi-Shinozaki and Shinozaki 1993), was also highly abundant in the dehydration-stressed library.

Table 3 EST copy number of highly prevalent ESTs (≥10 copies) in dehydration-stressed library (St) compared to the control (Co)

In silico identification of microsatellite sequences

To obtain gene-derived markers for further genetic studies of drought tolerance and other important traits in cassava, the EST clusters were screened for SSRs. Stretches of di-, tri-, and tetrameric nucleotide repeats were identified using parameters that would detect dimeric motifs with seven or more repeats, trimeric motifs with six or more repeats, and tetra- or pentameric motifs with four or more repeats. Perfect and near perfect repeats with slight repeat pattern deviations were scored. This allows for the possibility that there may be a perfect repeat pattern at a locus within different varieties and cultivars (Rudd et al. 2005). A total of 646 potential microsatellite loci were identified in 592 unigene sequences, representing 3.3% of the total number of ESTs queried. This figure is similar to the rate of microsatellite discovery in other species such as grape, sugarcane, and switchgrass where the frequency of EST-derived SSRs was between 2.5 and 3.8% (Scott et al. 2000; Cordeiro et al. 2001; Tobias et al. 2005). We detected 186 perfect dinucleotides, 131 perfect trinucleotides, 1 perfect tetranucleotide, and 2 perfect pentanucleotides. In addition, there were 264 imperfect dinucleotides, 57 imperfect trinucleotides, 4 imperfect tetranucleotides, and 1 imperfect pentanucleotide. The different classes of SSRs are summarized in Table 4. Dinucleotide repeats represented 70% of the total number of microsatellites. Of these (TC)n and (AT)n were the most common. Trinucleotide repeats were also detected with (GAA)n repeats being the most common class. Only one (imperfect) (GC)n repeat was returned and tetra- and pentanucleotide repeats were also uncommon.

Table 4 Frequency of simple sequence repeats in the EST collection

Discussion

Cassava is remarkable in its ability to withstand brief drought spells as well as extended periods of seasonal drought of 4–6 months. As such, drought tolerance in cassava shows several characteristics that cannot be studied in traditional model plants. In addition, cassava performs well in marginal soils where other crops fail. Considering that drought, together with soil degradation, represents a major cause of yield reduction or even crop failure in non-temperate regions, the molecular characterization of drought tolerance in cassava has implications for many aspects of crop improvement and utilization in cassava and other crops.

In this study we have reported the characterization of 18,166 ESTs generated from two normalized cDNA libraries, one prepared from dehydration-stressed tissues yielding 9,210 ESTs and one from well-watered control tissues yielding 8,956 ESTs. Dehydration stress was applied for brief periods (6–12 days) and both root and leaf tissues were sampled. We identified 8,577 unique sequences from the 18,166 ESTs and this supports the theory that normalization reduces redundancy. The number of functional genes in plants has been estimated to range from 26,000 to nearly 50,000 (Goff et al. 2002). Thus, the 8,577 unique sequences identified in this study likely represent one-third to one-sixth of the potentially expressed genes in cassava. Anderson et al. (2004) have previously reported on the development of an EST resource for cassava. These ESTs were primarily obtained from cDNA libraries targeted for starch and biotic stresses (cassava bacterial blight and cassava mosaic disease). The EST resource described here targets dehydration stress and therefore complements the existing cassava EST resource. Overall, the ESTs developed in this study double the number of publicly available EST sequences for cassava bringing the total to 36,120 sequences (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html; dbEST release 011907).

Annotation of the unique transcripts resulted in 17 different functional categories that were reasonably comparable to the functional categorization of ESTs from other higher plants: ‘unclassified proteins’ comprised the highest number of ESTs (25.71%) and the category ‘metabolism’ contains the largest number of classified ESTs (10.19%) (Ramírez et al. 2005). The number of EST clusters present in the 17 functional categories in the dehydration-stressed and the control, well-watered tissues was not significantly different as shown by t-tests. The functional category ‘cell rescue, defense, cell death, and ageing’ was further broken down into its eight subcategories. The number of ESTs present in these categories for the drought and control libraries was also not significantly different.

Although the control and the dehydration-stressed libraries may contain a similar number of classified genes, this does not mean that the genes within the different functional categories are identical. To assess qualitative differences between the dehydration stress and control libraries and investigate whether the drought stress treatment was effective in enriching for dehydration-responsive genes, ESTs unique to the dehydration-stressed library were identified in silico in the ‘cell rescue, defense, cell death, and ageing’ category. This analysis uncovered numerous ESTs with recognized roles in drought stress in other plant species, including those encoding proteins involved in oxidative stress and/or protection of cellular membranes (monodehydroascorbate reductase, catalase, ascorbate peroxidase, and phospholipid hydroperoxide glutathione peroxidase), signal transduction (phospholipase D, PI-PLC, and CBL proteins), as well as transcription factors (heat stress transcription factor A3, heat shock transcription factor 21, and ATHB12). ESTs induced by other abiotic stresses than drought or by biotic stresses were also unique to the dehydration-stressed tissues. These included ESTs encoding proteins involved in metal homeostasis and tolerance (BCB proteins, MT proteins, and stellacyanin), cold (cold acclimation protein), salt (salt-inducible protein), and host plant resistance (GLP and the disease resistance RPP5-like protein). There is considerable overlap between various abiotic stress signaling pathways in plants, especially drought, cold, and salt, since all these stresses require protection against cellular dehydration (Knight and Knight 2001). It is therefore not surprising to find cold and salt-stress induced genes in the dehydration-stressed library. Previous studies have also revealed links between biotic and abiotic pathways. Chini et al. (2004) recently demonstrated that an activation-tagged allele of activated disease resistance 1, previously shown to convey broad spectrum disease resistance, conferred significant drought tolerance, indicating that there may be significant overlap between signaling network(s) that establish disease resistance and drought tolerance.

The EST libraries used in this study have relatively low redundancy because they were normalized, but still contain many more copies of some transcripts than others. EST clusters with ten or more members in the ‘cell rescue, defense, cell death, and ageing’ category have also been studied. This analysis also uncovered genes that have recognized roles in drought stress responses in other plant species. The LEA proteins deserve special mention in view of previous physiological studies on drought in cassava. ESTs encoding LEA proteins were among the most abundant ESTs in the dehydration-stressed tissues. LEA proteins are thought to have a role in binding water and some of the LEA proteins can be considered compatible solutes involved in osmotic adjustment (Ingram and Bartels 1996). Physiological studies in cassava have shown that the main mechanism for drought resistance is drought avoidance through the production of ABA with little or no change in leaf water potential indicating minimal osmotic adjustment (El Sharkawy 2004). In agreement with this, dehydrins, a subgroup of LEA proteins, could not be detected in cassava tissues following brief episodes (6 days) of water deficiency (Alves and Setter 2004a). Our analysis shows that ESTs encoding LEA proteins are highly abundant in dehydration stressed tissues compared to control tissues. It is tempting to speculate that specific LEA proteins, distinct from dehydrins, play a role in osmotic adjustment in cassava. Recently, specific types of LEA proteins have been expressed in transgenic wheat and rice resulting in improved drought tolerance (Sivamani et al. 2000; Rohila et al. 2002).

The EST data and analysis presented here provide a first overview of the transcripts that are expressed in cassava under dehydration stress. The annotation and comparative analysis of these ESTs have identified numerous transcripts with recognized roles in dehydration stress, many of which were unique to the dehydration-stressed library. Overall, the data indicates that the dehydration treatment and normalization procedure has been effective in capturing drought-responsive transcripts. It should be emphasized that EST approaches are suitable for qualitative rather than quantitative comparisons (Rodriguez Milla et al. 2002; Ramírez et al. 2005). In addition, EST copy number in normalized EST libraries will be further diluted due to the normalization procedure, which reduces redundancy. A comprehensive catalogue of dehydration-responsive genes along with a more accurate quantitative assessment of transcript levels will require the use of a more refined research tool for expression profiling, such as DNA microarrays. DNA microarray studies will also facilitate allele mining by screening cassava varieties with different levels of drought tolerance.

Several EST clusters identified in this study encode similar proteins. It is possible that these ESTs clustered separately due to partial sequence information or due to limitations of the clustering and annotation protocols. Alternatively, it is possible that some EST clusters represent distinct members of multigene families that show extensive sequence identity. For example, CBL proteins are encoded by a family of at least six genes in Arabidopsis that are differentially regulated by stress signals (Kudla et al. 1999). Similarly, the (non-specific) LTPs are encoded by multigene families with individual genes having time- and tissue-specific expression patterns and induced under a variety of conditions (Lindorff-Larsen et al. 2001). Other examples include the MT proteins, encoded by multigene families that are differentially regulated in rice (Hsieh et al. 1995; Kawasaki et al. 2001), PLD proteins, LEA proteins, GLP, and BCB proteins which are all encoded by multigene families in other plant species. Whether these EST clusters indeed represent distinct members of multigene families in cassava, needs to be verified by expression profiling or other techniques that can discern different members of multigene families.

The cassava genotype used in this study, TME117, is a landrace grown in humid and subhumid conditions. While this genotype is not specifically adapted to (semi)arid environments, it should be noted that the (sub)humid agroecologies also experience periodic dry spells and an extended dry season. Also, phenotypic differences between plant varieties can often be attributed to differences in gene regulation rather than to novel gene sequences per se. The fact that numerous drought-responsive genes have been uncovered, supports the hypothesis that this genotype can be utilized to identify genes involved in drought tolerance and opens up new avenues to search for (superior) allelic variants.

The EST sequence resources for cassava we have generated should provide readily available sources of genes that can be used to discover and develop functional molecular markers for germplasm characterization and marker assisted breeding of cassava and other species from the Euphorbiaceae such as castor bean. Popular markers that can be developed from ESTs are single nucleotide polymorphisms, conserved orthologous sets and SSRs. In this study, we have identified a total of 646 microsatellite repeats. Okogbenin et al. (2006) recently constructed an SSR linkage map for cassava with SSRs developed from genomic and cDNA libraries. These authors found that 40% of the SSRs derived from cDNA were polymorphic. Their findings suggest that the EST-SSRs described in this paper will significantly increase the number of SSRs available for molecular genetic studies in cassava.

In summary, the EST collection described here is the first reported dehydration stress transcriptome of cassava and has already uncovered a wealth of putative drought-responsive genes. This EST resource is also a rich source of gene-derived molecular markers. In the short term, these tools can be applied in genetic studies and for expression profiling to further dissect drought tolerance mechanisms in cassava. Appropriate protocols have been established for cassava, which allow the introduction of new traits via modern gene transfer techniques (Taylor et al. 2004). Likewise, framework linkage maps have been developed for cassava and utilized for genetic mapping of genes controlling agronomic traits (Fregene and Puonti-Kaerlas 2002). In view of recent progress in manipulating drought tolerance in model plants using biotechnology tools, a better understanding of drought tolerance in cassava at the molecular level will, in the long run, also facilitate the genetic improvement of cassava for drought tolerance using a combination of biotechnological and conventional methods.