Introduction

Dinoflagellates are unicellular protists most closely related to the ciliates and apicomplexa (Fast et al., 2002). Unique among eukaryotes, dinoflagellates have permanently condensed chromatin, but lack histones and nucleosomes typically involved in regulating chromosome condensation and gene expression. They also have evolved a unique mitosis in which the nuclear envelope remains intact and the mitotic spindle consists of extra nuclear microtubules that traverse the nucleus through cytoplasmic channels (Bhaud et al., 1999). Dinoflagellates typically possess large genomes (up to 200 pg/cell) that are generally considered to be haploid (Triplett et al., 1993; Santos and Coffroth, 2003).

Karenia brevis is a dinoflagellate whose expressed genome is of interest because of its role in producing the harmful algal blooms (HABs) or “red tides” that occur annually in the Gulf of Mexico. K. brevis blooms cause extensive fish kills, mortality among of protected marine mammals, and human illness through the production of highly potent neurotoxins known as brevetoxins. The K. brevis genome consists of 121 chromosomes (Walker, 1982) containing 100 pg of DNA per cell, or approximately 1 × 1011 bp (Kim and Martin, 1974; Rizzo, 1982; Sigee, 1986; Kamykowski et al., 1998). This is 30 times the size of the human genome; however, dinoflagellate chromosomes consist of a permanently condensed, genetically inactive central region with peripheral loops of B-DNA that protrude from this core and comprise the actively transcribed DNA (Sigee, 1984; Anderson et al., 1992; Bhaud et al., 1999). Therefore, although the size of the expressed genome of K. brevis is unknown, it is anticipated to be substantially smaller than its total genome size might predict.

Insight into the molecular mechanisms that control growth, toxicity, and persistence of K. brevis blooms is critical to understanding the formation of HABs; however, few investigations into the molecular biology of K. brevis exist. Antibody-based approaches have yielded some insight into K. brevis cell cycle control, with the identification of the central eukaryotic cell cycle regulator, cyclin-dependent kinase (Van Dolah and Leighfield, 1999), and its regulatory subunit cyclin (Barbier et al., 2003). Nonetheless the partners and cell cycle substrates of this central regulator remain unidentified, and the unique features of the dinoflagellate nucleus suggest that unusual mechanisms may have evolved. The basis for K. brevis toxicity is the production of brevetoxins, polyether toxins with structures that suggest synthesis through a polyketide synthase pathway. Yet investigations into polyketide synthase genes in K. brevis have resulted in ambiguity regarding the dinoflagellate-versus-bacterial contributions to polyketide synthase activity in this organism (Snyder et al., 2003). Antibody-based studies have also been used to identify stress proteins that may play a role in the adaptation and persistence of blooms under stressful environmental conditions (Miller-Morey and Van Dolah, 2004). However, the scope of such studies is limited by the availability of antibodies cross-reactive with dinoflagellate proteins. Thus the need is clear for genomic tools with which to study gene expression and regulation in this organism for which little molecular information is available.

Sequencing of complementary DNA libraries to generate expressed sequence tags (ESTs) is an effective means of discovering expressed genes in organisms for which genomic data are unavailable. ESTs serve as markers for genes expressed under specific conditions and can be used as probes in the recovery of full-length cDNA or genomic sequences, recognition of exon and intron boundaries, delineation of protein families, and development of probes for genomewide expression profiling. To this end we constructed a cDNA library to K. brevis and carried out large-scale sequencing to yield an expressed EST database containing 7001 ESTs and 5280 unique gene clusters. These ESTs were then used to develop an oligonucleotide microarray specific for K. brevis gene expression. Microarray technology provides the capacity to profile genomewide changes in gene expression in response to different exposure conditions, to identify genes involved in specific pathways on the basis of their coordinated responses, and to assign function to unknown genes on the basis of their induction in response to known challenges. This approach will greatly expand our understanding of K. brevis physiology at the molecular level and develop our understanding of the effects of modifications on the marine environment (Jenny et al., 2002). Here we present details of cDNA library construction, insight into the K. brevis genome as revealed by EST data analysis, and the development and validation of a DNA microarray for investigation of K. brevis functional genomics.

Materials and Methods

Strain and Culture Conditions of Cells

The Wilson isolate of K. brevis was used for this study. The growth and behavior of this isolate have been well studied during the approximately 50 years it has been in culture. Cells were maintained in batch culture in 1-L glass bottles with autoclaved, 20-μm-filtered seawater at 36 psu obtained from a seawater system at the Florida Institute of Technology field station at Vero Beach. Seawater was enriched with f/2 medium (Guillard, 1973). Cultures were maintained at 25° ± 1°C on a 16:8-hour light-dark cycle. Illumination from cool white lights was maintained at a photon flux density of 40 to 50 μE·m−2·s−1 (measured by Li-Cor 2π sensor).

Construction of cDNA Library

Cultures were harvested by centrifugation (1000g) during the logarithmic phase of growth at circadian time CT 16–18. This time was chosen to maximize the likelihood of expression of cell cycle genes, according to the known diel phasing of the K. brevis cell cycle (Van Dolah and Leighfield, 1999). Total RNA (2 mg) was isolated from approximately 20 L of K. brevis culture using Qiagen RNeasy columns. Then cDNAs were synthesized by oligo(dT) priming from poly(A) messenger RNA, size-selected (>400 bp), and directionally cloned into a λ Zap II vector system (Stratagene).

Clone Propagation, Plasmid Isolation, and EST Sequencing

Packaged phages were used to infect XL1-Blue cells, and mass in vivo excision of the pBluescript SK(−) phagemid from the λ ZAP II vector was performed with the ExAssist helper phage. Separate cultures of XL1-Blue MRF′ and SOLR cells were grown overnight in LB broth with supplements at 30°C. Cells were centrifuged (1000g) and resuspended in 10 mM MgSO4 to an OD600 of 1.0 (8 × 108 cells/ml). A portion of the λ bacteriophage library (3.9 × 107 pfu) was combined with XL1-Blue cells at a multiplicity of infection (MOI) of 1:10 λ phage–cell ratio. ExAssist helper phage (1 × 109 pfu) was added at a 10:1 helper phage–cell ratio to ensure that every cell was co-infected with λ phage and helper phage. Cells were incubated at 37°C to allow the phage to attach, transferred to LB broth, and incubated at 37°C with shaking. After 2.5 to 3 hours of incubation, cultures were heated (65°–70°C) for 20 minutes to lyse the cells.

The excised phagmids were transformed into SOLR cells, and individual colonies were grown on LB-ampicillin agar plates. The pBluescript DNA was purified with QIAprep Miniprep Kits using a QIAvac 96 Top Plate system or a BioRobot 9600 system (Qiagen). Sequencing reactions were carried out from the 5′ end of the cDNA insert using a universal T3 primer (5′-ATTAACCCTCACTAAAG-3′).

EST Database Analysis and Management

Lasergene SeqMan II software (DNAStar Inc.) was used to remove vector sequence, evaluate the quality of underlying sequencing trace data, and eliminate poor-quality ESTs using the following criteria: sequences shorter than 100 bp were eliminated, sequences with trace Phred threshold below 12 were removed, and sequences were terminated when more than 3Ns occurred in a 20-bp rolling window. Edited EST sequences were compared with the nonredundant GenBank sequence database using the basic local alignment search tool program (BLAST; Altschul et al., 1990) in its version for nucleotides (BLASTn) and amino acids (BLASTx). EST sequences were then clustered using the SeqMan II contig assembly process with a minimum cutoff value of 95% identity in a 50-bp window.

DNA Microarray Design and Validation

To facilitate design of 60-bp oligonucleotides that uniquely recognize gene sequences sharing identity as defined by BLASTx similarity scores, a second pass clustering was performed using identity criteria of 75% identity over 30 bp (Parcel Clustering Package Version 2.2.8). This contracted the number of unique contigs to 4629, to which 60-mer oligonucleotides were designed. The resulting oligonucleotide probes and controls were printed on glass slides using an Ink Jet–based printing method, yielding an 8455 feature array (Agilent Technologies), including 4629 primary probes to each of the unigenes, 3826 replicate probes to selected genes, and 192 hybridization controls. Two arrays were printed per 1 × 3-inch glass slide.

A self-versus-self hybridization was performed for determining probe specificity, array reproducibility, and microarray feature uniformity. Total RNA from K. brevis cultures was prepared using Tri-Reagent according to the manufacturer’s protocol. Following precipitation of the RNA, the pellet was resuspended and run through a Qiagen RNeasy column for removal of contaminating DNA and protein, and then total RNA was quantified by UV spectroscopy and qualified on an Agilent Bioanalyzer. Total RNA (350 ng) was amplified and labeled with the Cy3 and Cy5 dyes (PerkinElmer) with an Agilent low-input linear amplification kit according to manufacturer’s protocol. Following labeling we quantified amplified RNA by UV spectroscopy, and 350 ng each of Cy3 and Cy5 labeled targets were hybridized to the array for 17 hours at 60°C. After hybridization arrays were washed consecutively in solutions of 6× SSPE and 0.005% N-lauroylsarcosine and 0.06× SSPE and 0.005% N-laurel sarcosine for 1 minute each at room temperature. This was followed by a 30-second wash in a stabilization and drying solution (Agilent Technologies). Following the wash steps microarrays were imaged using an Agilent microarray scanner. The scan was extracted and normalized by a combination Linear/Lowess algorithm using Agilent Feature Extraction Version 7.5 software. Data were further evaluated for feature uniformity, signal intensity, and signal-background ratio, dye bias, and reproducibility of replicate probes using the Rosetta Luminator gene expression analysis system and Microsoft Excel.

Results and Discussion

Acquisition and Features of Generated ESTs

The K. brevis cDNA library contains 3.9 × 107 primary recombinants with an average insert size of 1.8 kb and an insert size range of 0.5 to 2.8 kb. A total of 9728 sequencing reactions were performed from the 5′ end of individual, randomly selected clones to obtain ESTs. After removal of sequences identified as low quality, 7001 sequences with a minimum of 100 bp of continuous sequence were retained for further analysis. The average size of ESTs in the database is approximately 700 bp. All EST sequences are publicly available on the joint NOAA/Medical University of South Carolina website (marinegenomics.org), along with their BLASTn and BLASTx search results, and have been submitted to the GenBank dbEST database (accession numbers CO59029–CO065717, CO517335–CO517390, CV173737–CV173976, and CV179548).

Assembly of ESTs into Unigene Clusters

Many difficulties exist with using raw EST sequence information to identify unique genes, as an EST represents only a partial sequence derived from a cDNA library clone corresponding to a single mRNA molecule that may be present in multiple copies. Artifacts from cDNA library construction and single pass sequencing reactions, long 3′ or 5′ untranslated regions, and the efficiency of reverse transcriptase often cause EST sequences to be relatively short, highly redundant, and error prone. Thus, to better assign gene identities, the ESTs were assembled into clusters with a minimum identity of 95% within a minimum 50-bp region of overlap. By this definition, contigs derived from the clustered sequences represent unique expressed genes. From the 7001 high-quality ESTs, 5280 unigene clusters were identified (Figure 1). Of these, 4399 contained single ESTs, representing 63% of the total ESTs analyzed. The remainder fell into 881 clusters with sizes ranging from 2 ESTs (576 clusters) to 31 ESTs (1 cluster). of these, 98% belonged to clusters containing 4 or fewer ESTs, indicating that the vast majority of genes expressed fell into a low-abundance class (Figure 2). The highest expressed sequence (whose cluster consisted of 31 ESTs) represents less than 1% of the total ESTs. Thus the overall redundancy of the library is low. This is reflected in the high rate of novel EST acquisition over the 9728 sequencing reactions performed (Figure 2). Similarly high sequence diversity is apparent in EST libraries generated to other dinoflagellate species, Lingulodinium polyedrum (1519 ESTs) and Amphidinium carterae (3380 ESTs; Bachvaroff et al., 2004). What fraction of the total expressed dinoflagellate genomes any of these EST collections represent is unknown at present. A total genome sequence has been completed for the apicomplexan Plasmodium (5300 genes; Gardner et al., 2002) and is underway for the ciliate Tetrahymena (25,000–30,000 genes expected; J. Carlton, TIGR, personal communication); thus the genome sizes of nearest relatives offer little insight into the sizes expected in dinoflagellates.

Fig. 1
figure 1

Cluster analysis of 7001 K. brevis ESTs. Unigene clusters are sorted as a function of cluster size (ESTs per cluster). Of a total of 5280 individual clusters, 4399 contain single ESTs, while 881 contain 2 to 31 ESTs.

Fig. 2
figure 2

Redundancy of the K. brevis ESTs. Number of unique sequences plotted against the number of total sequences.

Overview of Genes Identified by Database Search

The 5280 contigs were compared with the public nonredundant protein sequence database using the BLASTx search algorithm. Using a cutoff P value of less than 10e−4, 1556 (29%) showed similarity to previously identified genes from a wide variety of organisms. Of these, the largest number are involved in metabolism (23%), signal transduction (20%), transcription/translation (15%), and structure/cytoskeleton (11%) (Figure 3).

Fig. 3
figure 3

Functional classification of 1556 EST clusters that showed similarity to previously identified genes. Distribution of functional classes excludes clusters of no known similarity.

Only 24 ESTs were present at 10 or more copies (Figure 4). The highest expressed gene in the library was that for fucoxanthin chlorophyll a/c binding protein, with 50 copies present (0.7% of the total ESTs). K. brevis is the only fucoxanthin-containing dinoflagellate for which gene expression data are available. However, among the peridinin-containing dinoflagellates for which data are available, the functionally analogous peridinin chlorophyll binding protein is also among the most highly expressed genes (Bachvaroff et al., 2004). Interestingly, flavodoxin, involved in the light reactions of photosynthesis, was the only other highly expressed photosytem gene identified in K. brevis (17 copies). Also among the highest expressers were s-adenosyl methionine synthase, adenosylhomocysteinase, and glutamine synthase, all components of amino acid biosynthetic pathways. These genes were also found to be among the highest expressed sequences in Lingulodinium (Bachvaroff et al., 2004). Cytoskeletal proteins were also represented in the highly expressed group, with actin present in 15 copies and α-tubulin in 34 copies. Its partner β-tubulin was found in only 3 copies, yet both α-tubulin and β-tubulin have been identified in the cytoskeleton of K. brevis by immunofluorescence and localization studies, suggesting the normal α, β dimer is present (Barbier, M., Miller, J., Morton, S.L., and Van Dolah, F.M., manuscript submitted).

Fig. 4
figure 4

Twenty-four most abundant genes.

Cell Cycle Genes

Despite their importance in understanding mechanisms regulating proliferation, little molecular information is available on dinoflagellate cell cycle genes. The K. brevis EST database provides the first collection of cell cycle genes in a dinoflagellate (Table 1). Cell cycle genes identified include the central cell cycle regulator cyclin-dependent kinase (CDC2). Its partner, cyclin, was not found, despite antibody-based evidence for its existence in dinoflagellates (Barbier et al., 2003; Wong et al., 1997); however, this is not surprising as even among mammals cyclin sequences are not highly conserved and cyclins were generally identified by complementation in yeast (Lew et al., 1991). Overall, S-phase-specific genes involved in the DNA replication machinery had the highest similarities to known genes. These include a suite of genes whose activity is directly or indirectly regulated by cyclin-dependent kinase: ribonucleotide diphosphate reductase, proliferating cell nuclear antigen (PCNA), replication factor C, replication protein A, and DNA ligase. Mitosis-specific genes include the anaphase-promoting complex protein 3 (CDC27) that controls mitosis exit through ubiquitin-directed proteolysis of M-phase-specific proteins and CDC48, an AAA ATPase that participates in spindle disassembly by targeting spindle-stabilizing proteins for proteolysis directed by anaphase-promoting complex (APC). The presence of APC-mediated activities has been reported previously in the dinoflagellate Crypthecodinium cohnii; however, no molecular or structural basis for this activity was identified (Yeung et al., 2000). Additional putative M-phase genes with lower similarity include mitotic checkpoint kinases NIMA and Bub1, and regulators of mitotic protein phosphatase 1 (Sds22), chromatid cohesion (Dif1), and chromosome condensation and segregation. Together this gene list supports the presence of a eukaryotic cell cycle machinery in dinoflagellates and provides tools for its functional analysis.

Table 1 Cell Cycle ESTs Present in Karenia brevis Nuclear Genome, Sorted by Gene Name

Signal Transduction Genes

The EST database also provides the first overview in dinoflagellates of signal transduction pathways present that relay environmental cues to elicit cellular responses (Table 2). Components of several conserved ser/thr protein kinase transduction cascades in eukaryotes were identified, including pathways dependant on cAMP, Ca, and calmodulin, and MAP kinase pathways. Previous studies have demonstrated the presence of cAMP-dependent kinase in two dinoflagellate species, Amphidinium operculatum (Leighfield et al., 2002) and Gonyaulax polyedra (=Lingulodinium polyedrum; Salois and Morse, 1997). A MAP kinase has also been reported in Pfiesteria piscicida (Lin and Zhang, 2003). In contrast, protein tyrosine kinases were not found, which is consistent with the notion that they diverged from ser/thr kinases with the emergence of metazoans (Kruse et al., 1997), although reversible Tyr phosphorylation was reported in Prorocentrum lima (Dawson et al., 1997). ESTs with similarity to both ser/thr protein phosphatases (type 1, type 2a, and type 2c), which oppose the actions of ser/thr kinases in signaling cascades, and dual function (ser/thr/tyr) protein phosphatases, involved in many growth-regulatory processes, were identified. The presence of type 1 and 2 ser/thr protein phosphatases in dinoflagellates has previously been demonstrated (Boland et al., 1993; Comolli et al., 1996; Sugg and Van Dolah, 1999). ESTs with moderate similarity to known membrane-bound signal receptors were identified, including an acetylcholine receptor (e−16) and opioid growth factor receptor (e−33). A cryptochrome blue light receptor revealed in K. brevis by EST analysis is the first light-dependent receptor reported in dinoflagellates.

Table 2 ESTs Involved in Intracellular Signaling Pathways in Karenia brevis, Sorted by Gene Name

Transcription/Translation Genes

The transcriptional and translational machinery comprises a third class of genes critical to understanding dinoflagellate gene expression, and is of particular interest because of the absence of nucleosomes typically involved in eukaryotic gene expression. Together these processes were represented by 15% of the expressed ESTs with similarity to known genes. Members of the basal transcriptional apparatus identified in the library include RNA polymerase I, which is responsible for ribosomal RNA synthesis and localized to the nucleolus. No EST with similarity to RNA polymerase II, which is responsible for eukaryotic mRNA transcription, was identified. However, the presence of RNA polymerase II in dinoflagellates is supported by their sensitivity to α-amanitin (Rizzo, 1979). The canonical promoter sequence TATA has not been found in dinoflagellates to date. However, a protein with similarity to TBP (cTBP), which shows structural similarity to TATA box-binding proteins (TBPs), but preferentially binds to TTTT, has been described in C. cohnii (Guillebault et al., 2002). The K. brevis library contains only a single EST with modest similarity to an accessory transcription initiation complex, TFIIE. Whether the absence of ESTs with similarity to genes involved in transcriptional regulation reflects divergence due to the unusual chromatin structure of dinoflagellates, or simply reflects their low level of expression, remains to be determined. A number of genes involved in RNA processing were evident, including genes involved in polyadenylation, splicing, and RNA stability. In contrast to transcription, numerous components of the translation machinery are present with good similarity to known genes (Table 3), including translation initiation and elongation factors. Of these, EIF5a has previously been reported in C. cohnii (Chan et al., 2002).

Table 3 Transcription/Translation ESTs Present in Karenia brevis Nuclear Genome, Sorted by Gene Name

Codon Usage and GC Content

Fifty K. brevis ESTs, selected from those with the highest similarity to previously identified genes in GenBank and those having no similarity to known genes, were chosen to determine the degree of GC-rich coding regions and general information about codon usage. The average GC content in these K. brevis ESTs was 51%, with a G or C in the third position in 53.5% of codons. This is similar to the GC content of coding regions (CDS) in Amphidinium carterae (50.44% in 13 CDSs) and C. cohnii (50% in 19 CDSs); in contrast, L. polyedrum has a higher degree of GC-rich coding regions (59%) with a 75% preference for a G or C in the third position (Codon Usage Database, GenBank Release 140.0, March 2004). These findings suggest that dinoflagellates may vary widely in their use of GC in coding regions. The GC content of K. brevis coding regions is in an intermediate range between bacteria, mammalian species, and higher plants at 62%, 52%, and 45% GC content, respectively.

Prevalence of Single Nucleotide Polymorphisms

Single nucleotide polymorphisms (SNPs) are the most common type of sequence variation between gene alleles and can be used as tools in genetic mapping, estimating population diversity, and correlating genotype to phenotype. SNPs identified from EST databases are especially informative in that they identify population diversity within expressed genes (Kota et al., 2001). There is growing precedence for the presence of multiple copies of genes in dinoflagellates. The peridinin chlorophyll α–binding protein (PCP) gene family is present in 5000 copies per 200 pg of DNA in the genome of L. polyedrum (Le et al., 1997), one of the largest gene families reported for any organism. The same gene family is present in 36 copies in Symbiodinium, a symbiotic dinoflagellate associated with corals that has a substantially smaller genome of 3 pg (Reichman et al., 2003). The PCP genes are present in tandem arrays, and in Symbiodinium 89% of clones screened from genomic and cDNA libraries were distinct at the nucleotide level. Similar tandem repeats of gene families have been reported for luciferin-binding protein and rubisco (Lee et al., 1993; Machabee et al., 1994; Rowan et al., 1996). A cAMP protein kinase from L. polyedrum is also present in 30 copies (Salois and Morse, 1997). Of the 305 K. brevis contigs containing more than 2 ESTs, 39.7% contained SNPs. To determine whether a discrepant base might be a SNP or an experimental error, we relied on sequence quality information from the trace reads and replication of base calls from multiple ESTs within the contig.

Visual inspection of sequencing chromatograms or sequence alignments in EST libraries from other organisms has revealed SNPs at a frequency of occurrence of 0.82 per 100 bp in barley (Bundock et al., 2003) and 1.32 per 100 bp in catfish (He et al., 2003). In the K. brevis EST library, the estimated overall frequency of SNPs in contigs containing 10 or more ESTs was 1 in 90 bp. Of these, 85% are in the third codon position, indicating a synonymous substitution that does not alter protein sequence. The prevalence of SNPs in the expressed sequences of K. brevis suggests that multiple copies of many genes exist in this dinoflagellate. A similar frequency of occurrence of SNPs was observed in L. polyedrum and A. carterae (Bachvaroff et al., 2004).

Development of a K. brevis DNA Microarray

The availability of ESTs from K. brevis not only provides the first insight into the expressed genome of this problematic organism, but also provides the tools with which to begin investigating mechanisms regulating its growth and toxicity. In addition to the 1531 unigenes with similarity to known genes, the library contains 3749 sequences of unknown function. Therefore we have chosen to take a genomewide transcript profiling approach to characterize responses of known and unknown genes to conditions known to alter growth, cell cycle progression, and toxicity. To facilitate this, 60-bp oligonucleotides were designed to uniquely recognize gene sequences sharing identity as defined by BLASTx similarity scores and which cluster using an identity of 70% over 30 bp. The resulting oligonucleotide probes and controls were printed on glass slides yielding an 8455 feature array, including 4629 primary probes to each of the unigenes, 3462 replicate probes to selected genes, and 364 hybridization controls (Figure 5). Replicate probes were randomly distributed over the array to minimize any bias that may arise owing to probe position.

Initial experiments were carried out to optimize labeling protocols and validate probe specificity and reproducibility between replicate probes. Preliminary experiments determined that direct Cy3/Cy5 labeling of amplified RNA provided superior reproducibility to labeling of cDNA targets. Therefore the RNA amplification method was selected for use with the microarray. For probe validation a pooled sample of RNA from daytime and nighttime cultures was used in order to maximize the number of genes expressed, since genes being expressed during daytime hours, such as photosynthetic genes, may be downregulated at night, while most likely many genes expressed at night will be downregulated during the day. Of the 8091 (noncontrol) features, 90.6% produced a mean fluorescence signal that was greater than twice the average background. The mean signal-to-background ratio was 58.3. Global background intensities averaged from the 3 arrays were 50 counts in the Cy5 channel and 46 counts in the Cy3 channel. Excellent repeatability was found between duplicate probes, in terms of both fluorescence intensity and fold change in response. Thus the microarray appears to provide sufficient sensitivity and reproducibility for probing global gene expression in K. brevis.

Fig. 5
figure 5

Karenia brevis microarray. Self-versus-self hybridization using Cy3 and Cy5 labeled RNA targets on a K. brevis 60-mer oligo microarray. The inlay shows an 8 × 10 block enlargement of features.

Variability of gene expression must next be assessed for each gene on the array through multiple replications of an experiment. Once the normal range of variability is defined, it will be possible to define threshold levels above which a certain fold change in expression for a particular gene may be called significant. Parallel experimentation must be done to address whether biological significance can be inferred from changes in transcription assessable by microarray analysis. Under seemingly identical culture conditions, the level of expression for some genes varies greatly, with little physiologic impact. Conversely, minor changes in transcription of other genes may cause significant physiologic changes owing to translational or posttranslational regulation of gene expression, which has been observed in some dinoflagellate genes involved in circadian controlled bioluminescence. Our current investigations therefore focus on changes in global gene expression in K. brevis in response to two major physiologic regulators of K. brevis growth and behavior, the light-dark cycle and the circadian clock.

Conclusions

We have established a database of K. brevis ESTs from cells in logarithmic phase of growth. This is the first genomic database to be developed on a fucoxanthin-containing dinoflagellate, K. brevis, and provides the basis from which to begin functional genomic studies on this HAB species. To date, 7001-high quality ESTs have been analyzed and clustered into 5280 nonredundant groups. Cluster analysis and protein similarity searches indicated that the vast majority of genes expressed by K. brevis fall into a low-abundance class. The average GC content in these K. brevis ESTs was 51%, with a G or C in the third position in 53.5% of codons, indicating dinoflagellates may vary widely in their use of GC coding regions. Of the 305 K. brevis contigs containing more than 2 ESTs, 39.7% contained SNPs. The prevalence of SNPs in the expressed sequences of K. brevis suggests that multiple copies of many genes exist in this dinoflagellate.

We have identified for the first time in a dinoflagellate conserved eukaryotic genes involved in cell cycle control, intracellular signaling, and the transcription/translation machinery that are critical to understanding growth regulation in this organism. Despite its unusual chromatin structure and mitotic apparatus, analysis of the EST database suggests that K. brevis possesses typical components of the eukaryotic cell cycle machinery. Similarly, members of all major signal transduction cascades and the protein phosphatases that oppose them were found. In contrast, the near absence of identifiable transcription factors in the EST database, and the lack of identifiable TATA boxes in other dinoflagellate genes, may reflect some divergence in transcription factors relative to other eukaryotes.

Although 29% of the ESTs in the K. brevis library had identity with genes in the GenBank nonredundant database, 71% had little or no sequence identity to known genes. A promising approach to understanding the regulation of known genes and the function of unknown genes in an uncharacterized genome is the examination of their transcriptional responses by an array-based monitoring system. To this end, we designed 60-mer oligonucleotide probes specific for each unigene and developed an 8455 feature microarray including 4629 primary probes to each unigene, 3826 replicate probes to randomly selected genes, and 192 hybridization controls. Following quality control experiments, 90.6% of the features produced a mean florescence 2 times that of the average background and good repeatability between replicate probes. Therefore we determined the microarray was satisfactorily sensitive and specific enough to probe genomewide functional analysis of K. brevis genes. This approach will yield insight into the expression of known genes and their associated regulatory pathways in response to different exposure conditions. In addition, microarray analysis will assist in identifying gene function of the large number of K. brevis genes that are unidentifiable, by their coordinated responses with known genes in response to specific challenges. Thus we anticipate that the microarray will provide a powerful tool for investigations into the pathways that control the growth and toxicity of HABs.