Introduction

Watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] accounts for 2% of the world area devoted to vegetable production (FAO 1995). It is grown in 44 states in the United States, with an annual production of 2.1 million tons and a farm value of $340 million (National Watermelon Promotion Board; www.watermelon.org). In recent years, there has been an increase in consumer demand for seedless watermelons and production of this watermelon type has increased significantly. During 2003, over 60% of the watermelons produced in the United States were seedless (U.S. Department of Agriculture 2004).

Watermelon fruits are diverse in shape and size, in rind and flesh color, and in flesh texture, aroma, flavor, and nutrient composition. Like fruits of most plant species (Seymour et al. 1993), ripening watermelons undergo changes in pigment accumulation, flavor and aromatic volatiles, conversion of starch to sugars, and in increased susceptibility to post-harvest pathogens (Karakurt and Huber 2004). In this respect, there are differences between seeded and seedless watermelon fruits. For example, seeded watermelons may have shorter shelf life than seedless watermelons due to faster degradation of the tissue surrounding the seeds (Maynard 2004). Seedless watermelons may have different sugar and aromatic compound contents than seeded watermelons of similar genetic background. Identifying the genes that control watermelon fruit quality and analyzing their differential expression in seeded versus seedless watermelon will be useful in enhancing fruit quality, nutritional values, and shelf life of seedless watermelons, to make them suitable to consumer needs.

The high-throughput sequencing of cDNA clones (libraries) has produced extensive genomic databases and large numbers of expressed sequence tags (ESTs) for various plant species (Richmond and Somerville 2000; Alba et al. 2004). Extensive EST analyses have been conducted for Arabidopsis (Arabidopsis thaliana L. Heynh.) (The Arabidopsis Genome Initiative 2000), rice (Oryza sativa L.) (Yu et al. 2002), and tomato (Budiman et al. 2000; Fei et al. 2004). Significant knowledge has accumulated in a number of plant species with respect to genes associated with cell wall metabolism, ethylene biosynthesis, and hormones affecting fruit setting, growth, and ripening (Giovannoni 2001). However, there is little information on genes controlling these processes in the watermelon fruit. Identifying, mapping, and characterizing these genes will be useful to research and breeding efforts in this crop.

In this study, we report the development of 832 “EST-unigenes” for watermelon fruit, and their classification based on their putative function in other plant species. In addition, we show that a large number of these “EST-unigenes” have no significant homology to any sequences reported so far in other plant species.

Materials and methods

Plant material

Watermelon fruits at early development stage (white flesh; 12 days post-pollination), ripening stage (light pink flesh; 24 days post-pollination), and ripe fruit (red flesh; 36 days post-pollination) from the heirloom cultivar “Illini Red” were used for RNA isolation. Leaf and stem tissue came from the terminal end of the U.S. Plant Introduction (PI) 525088 grown in the greenhouse under natural light, and 26 and 21°C day and night temperature, respectively. The plants producing watermelon fruits were grown in a field plot located at the South Central Agricultural Research Laboratory at Lane, OK. Upon collection, fruits were rinsed with sterile de-ionized water in the field, followed by flesh-tissue excision and processing as described later.

RNA isolation

Prior to use in the field, all glassware and utensils were treated with RNaseZap (Ambion Inc., Austin, TX, USA) to neutralize any RNAase activity, and rinsed with RNAase-free water before each use. Utensils used in the field were washed with 95% ethanol and were sterilized in the autoclave prior to use.

Upon excision, fruit flesh was immediately chopped, placed in a sterile 50 ml conical polypropylene tube, and then frozen with liquid nitrogen. These frozen sample tubes were partially sealed, and then transported in a liquid nitrogen container to a freeze-dryer (LabConco Co., Kansas City, MO, USA). The frozen samples were freeze-dried under shelf refrigeration at −35°C (as described by Callahan et al. 1989). Once lyophilized, the samples were treated with TRIzol reagent (Invitrogen Life Technologies Inc., Carlsbad, California, USA). RNA was extracted according to the manufacturer's protocols (1 ml TRIzol per 100 mg lyophilized tissue) as described by Chomczynski and Sacchi (1987). Leaf and stem tissues were processed in a similar manner after collection from the greenhouse. RNA quality and quantity were determined using a spectrophotometer and denaturing agarose gel electrophoresis (Levi et al. 1992).

cDNA synthesis, size selection, and cloning

Poly(A)+mRNA was isolated from total RNA using the Oligotex Direct mRNA kit (Qiagen, CA), and converted to double-stranded cDNA using the “Superscript Choice System kit” (Invitrogen, CA). First-strand cDNA synthesis was primed using a modified oligo(dT) primer (5′-AACTGGAAGAAT TCGCGGCCGCACGCA(T)18V-3′; V: A, G, or C) designed to anchor initiation at the 5′-end of the poly(A)+ tail and enable directional cloning. cDNA sequences greater than 400 base pairs were selected by agarose gel electrophoresis. EcoRI adaptors (Invitrogene, CA) were ligated to the cDNAs, followed by digestion with NotI, and then directionally cloned into the EcoRI and NotI sites of the pBluescript II SK+ vector (Stratagene). Cloned cDNAs were transformed into E. coli DH10B electrocompetent cells (Invitrogen, CA) and amplified as previously described by Soares and Bonaldo (1998).

Normalization of the primary library

The primary cDNA library was normalized as previously described (Bonaldo et al. 1996). Essentially, a single-stranded “tracer” version of the library was created by digestion with Gene II and Exonuclease III enzymes (Invitrogen). Contaminating double-stranded DNA was removed by hydroxyapatite (HAP) chromatography. The purified single-stranded library was used as a template for PCR amplification through the T7 and T3 priming sites that flank the cloned cDNA inserts. The purified PCR products, representing the entire cDNA population cloned, were used as a “driver” for subtractive hybridization. Essentially, 0.5 μg of PCR-amplified cDNA inserts were denatured and mixed with 50 ng of purified single-stranded tracer-DNA, as well as 10 μg each of 5′ and 3′ blocking oligonucleotides. The resulting solution (50% formamide, 0.12 M NaCl, 1% SDS) at a final volume of 20 μl was overlaid with mineral oil and subtractive hybridization was carried out for 44 h at 30°C. Non-hybridized, single-stranded tracer-DNAs were separated from hybridized DNA duplexes by HAP column chromatography. These purified, non-hybridized ssDNAs were rendered partially double-stranded by M13 reverse primer extension (only a small part of the second strand is synthesized by primer extension to improve transformation efficiency) and were electroporated into E. coli DH10B cells to generate the normalized library.

Subtraction of the normalized library

cDNAs generated from leaf tissue were subtracted to enrich the normalized library for genes differentially expressed in the fruit tissue. For this, a primary library was created from leaf mRNA as described earlier. Subtraction was performed essentially as described earlier, except that the driver consisted of PCR products from the primary leaf library. Essentially, 2.5 μg of PCR-amplified cDNA driver was combined with 75 ng of single-stranded tracer-DNA from the normalized library and 40 μg each of 5′ and 3′ blocking oligonucleotides. These were hybridized in a solution consisting of 50% formamide, 0.12 M NaCl, and 1% SDS at a final volume of 20 μl for 88 h at 30°C.

Sequencing of ESTs

Individual transformed bacteria colonies were robotically picked from agar plates and racked as LB media +10% glycerol stocks (All clones are grown in LB media +10% glycerol stocks for long-term storage at −80°C) in 384-well plates. After overnight growth, glycerol stocks were inoculated into LB medium amended with 100 μg/ml of carbenicillin in 96-well, deep-culture plates and grown for 16 h. Plasmid DNA was purified with Qiagen 8000 and 9600 BioRobots (Qiagen, CA) and associated chemistries. Sequencing of the 5′-ends was performed using standard T7 primer and ABI BigDye terminator chemistry on ABI 3700 and 3730xl capillary systems (Applied Biosystems, CA). All 384- and 96-well format plates were labeled with a barcode and a laboratory information management system (HTLims) was used to track the sample flow.

Sequencing and data analysis

Sequencing of random EST clones from normalized and subtracted libraries from various tissues of watermelon resulted in 1046 clean sequences. A sequence is considered clean when a minimum of 200 nucleotides remains after trimming vector and low-quality sequences. The average read length of clean sequences was 555 nucleotides with a minimum quality score of 20. The redundancy in the library was 20%. The final “clean” sequences were clustered and assembled using Paracel Transcript Assembler (PTA). Contaminant sequences like E. coli, mitochondrial, chloroplast, cloning vector, and RNA were filtered during the cleanup stage. Repeat sequences were masked and annotated. EST sequences were then clustered based on local similarity scores of pairwise comparison using 88% similarity over 100 nt. Clusters containing only one sequence were grouped as singlets. The EST clusters were assembled into contigs (contiguous sequence) by multiple-sequence alignment, which generates a consensus sequence for each cluster with criteria of 95% identity over 30 nt overlap. Multiple contigs may be generated per cluster, since EST clusters may not share enough similarity over their entire length to be assembled as single contig. Multiple contigs may also be generated when ESTs in a cluster represent an alternative splice form of the gene. The ESTs remaining in a cluster after the formation of contigs, are designated as cluster singlets. The set of non-redundant sequences for the library includes the contigs, cluster-singlets, and singlets and was designated as “EST-unigenes”.

These sequences were used to query the GenBank database for homologs using the Basic Local Alignment Search Tool (BLAST) (Altschul et al. 1990), using E-value of 0.01, to ensure a high level of confidence that each sequence represents a non-redundant gene transcript.

Results and discussion

A large number of the ESTs identified in this study are homologous to genes previously reported to be important in fruit growth and ripening in other plant species. The 1046 random cDNA clones sequenced produced 832 “EST-unigenes”. Of these 832 “EST-unigenes”, 747 were single ESTs (singletons; non-assembled sequencing reads), and 85 were contigs generated by computer-based assembly of sequence fragments from several clones (contigs). Of the 832 “EST-unigenes”, 578 have significant homology to amino acid sequences from the GenBank non-redundant protein database. A large number of these homologous sequences have previously been ascribed to Arabidopsis proteins, and were successfully annotated using gene ontology (GO) analysis. The length of the ESTs ranges from 338 to 699 bases, whereas contigs range from 555 to 2823 bases. The individual sequences of 1046 ESTs have been submitted to NCBI (Accession numbers: DV736965—DV738010, to be released on May 1, 2006).

A functional class was assigned to each “EST-unigene” based on the degree of similarity (E-value) to the closest counterpart sequence found in other plant species (Table 1). Of the 578 “EST-unigenes” that had significant homology to the nucleotide database (nt), 168 are homologous to genes with unknown function, while 410 are homologous to genes with known function (Table 1). These 410 “EST-unigenes”, based on GO annotation, were assigned to one of the following functional classes: (1) primary metabolism (74 “EST-unigenes”), (2) amino acid synthesis and processing (57 “EST-unigenes”), (3) membrane and transport (66 “EST-unigenes”), (3) cell division, cell wall and metabolism, cytoskeleton, and cellular organization (41 “EST-unigenes”), (4) DNA/RNA transcription and gene expression (63 “EST-unigenes”), (5) cellular communication/signal transduction (70 “EST-unigenes”), (6) defense- and stress-related proteins (31 “EST-unigenes”), (7) secondary metabolism (8 “EST-unigenes”). These “EST-unigenes” may take part in fruit development (involving rapid cell division and differentiation, as well as rapid nutrient and carbohydrate translocation, synthesis, and accumulation), and fruit ripening (cell wall softening, and break down of carbohydrates and storage proteins) (Table 1; Fig. 1) (Giovannoni 2001).

Table 1 ESTs with significant homology to genes with known function in other plant species

A large number of the watermelon fruit “EST-unigenes” are associated with basic cell function (Table 1, Fig. 1) including respiration, photosynthesis, electron transfer (cytochrome, mitochondrial, and chloroplast proteins), or carbohydrate synthesis (as shown for ADP-glucose pyrophosphorylase in watermelon fruit; In-Jung et al. 1998). Others are associated with amino acid and protein synthesis and trafficking (ribosomal proteins, hydrolase, phospholipase, malate dehydrogenase, and ubiquitin proteins; Table 1). During fruit development, a considerable amount of energy is invested in the chemical reactions leading to synthesis of amino acids, and synthesis of functional and storage proteins. Later, during fruit ripening, the transient storage proteins serve as reservoirs for the amino acids used for the synthesis of “ripening-associated” proteins (Peumans et al. 2002). A few of the watermelon fruit “EST-unigenes” are homologous to proteases or peptidases (Table 1) that might be associated with the release of amino acids from these storage proteins.

The developing fruit is a nutrient sink, which derives nutrients reallocated from other parts of the plant in support of its continued development. Indeed, a large number of the “EST-unigenes” are homologous to genes involved in membrane transport and cytosolic trafficking (Table 1). Upon reaching full size, the fruit enters the ripening phase through production of internal ethylene, followed by softening of cell walls, production of secondary compounds, as well as changes in sugar content, flavor, and aroma. This process involves a sequence of events that lead to the activation of transcription factors and signal transduction proteins associated with ripening (Giovannoni 2001). The ethylene biosynthesis genes (including the S-adenosylmethionine decarboxylase and ACC oxidase) and ethylene signal transduction genes (the ethylene receptor “Cm-ETR1”, or the transcription regulator “EIN3/EIL”; Table 1) take part in the ripening processes (Arif et al. 1994; Giles et al. 2001; Giovannoni 2001; Naoki et al. 2003). The abscisic acid- and auxin-induced proteins, DNA- and RNA-binding proteins, and a variety of protein kinases (Table 1) are also taking part in the signal transduction processes leading to fruit ripening.

Fruit softening is a result of enzymatic activity that impairs cell wall properties and dissolves chemical bonds between cell walls, leading to their separation (Rose and Bennett 1999). A few of the “EST-unigenes” are cell wall enzymes (the pectin-modifying protein expensin, the cell wall P8 protein, and the fasciclin-like arabinogalactan protein 8 precursor; AtAGP8; Table 1) required for cell surface adhesion and expansion (Brummell and Harpster 2001; Shi et al. 2003; Trainotti et al. 2003).

Transcription factors were also identified in this study (Table 1). These include the basic leucine-zipper transcription factor (bZIP family), a regulatory element for an abscission-specific cellulose enzyme involved in cell wall softening (Tucker et al. 2002), the CCAAT-binding transcription factor (CBF-B/NF-YA) protein, which binds to the CCAAT-box motif present in certain promoters of genes expressed in vegetative and reproductive plant tissues (Guerineau et al. 2003), and a CCCH-type zinc finger motif that plays a key regulatory role in flower and fruit development (Li et al. 2001).

The ability of a fruit to resist pathogen attack or environmental stress decreases with tissue softening (Brummell and Harpster 2001). Thus, cell defense and stress response genes might be programmed to be expressed in the fruit tissue to slow pathogen invasion and fruit tissue decline prior to full seed development and maturation. Among these, are the DNAJ heat shock chaperone proteins (Table 1), which protect the intracellular milieu proteins from irreversible aggregation during cellular stress. Other important stress and defense proteins include the exonuclease protein required for post-transcriptional silencing in Arabidopsis (Glazov et al. 2003), the hemoglobin (HB2) produced in plants growing under stress conditions (as shown for rice; Lira-Ruan et al. 2001), the zinc finger proteins that enhance disease resistance and drought tolerance in plants (Kim et al. 2004) and in the A. thaliana fruit (Balasubramanian and Schneitz 2002) (Table 1). “EST-unigenes” with a significant homology to ethylene-responsive element binding factor (ERF) proteins also exist in the watermelon fruit (Table 1). The ERF proteins are transcription factors linked to defense and stress response in plants (Oñate-Sánchez and Singh 2002).

Secondary compounds are produced mainly during fruit ripening. Carotenoids (including lycopen) are the main secondary compounds produced in watermelon fruit (Rodriguez-Amaya 1999). A group of “EST-unigenes” homologous to glutathione S-transferase (GST) genes were identified and classified in the category of secondary compounds (Table 1). Glutathione S-transferase proteins may perform a variety of functions in the binding of flavonoids and the deposition of these compounds in the vacuole (Board et al. 2000). GST proteins also take part in the detoxification of reactive electrophilic compounds by catalysing their conjugation to the tripeptide (gamma-glutamyl-cysteinyl-glycine) glutathione (Armstrong 1997). Recent studies implicated GSTs as signaling compounds leading to apoptosis (Dixon et al. 2002). The glutathione S-transferase domain is also found in elongation factors 1-gamma and the HSP26 family of stress-related proteins, which include auxin-regulated proteins in plants.

Fig. 1
figure 1

Distribution of watermelon flesh ESTs according to their function

Of the 832 watermelon “EST-unigenes” analyzed, 254 (∼30%) had no detectable homologs (E≥0.1) to any other plant genomes or protein sequences reported so far in GenBank (Fig. 1). Some of these “EST-unigenes” may represent untranslated (UTR) 3′ regions (Mignone et al. 2005). However, further studies are needed to determine if they are typical to watermelon and other cucurbit species. The majority of the watermelon fruit “EST-unigenes” reported in this study could be grouped into abundantly expressed gene families. However, a considerable number of the “EST-unigenes” could not be classified (Table 1; Fig. 1). Extensive genome sequencing is still needed for cucurbit species to identify the genes that may be distinct to this family. Future work will include sequencing and microarray analysis of ESTs representing different fruit tissues and developmental stages.