Introduction: Metabolites and Drug Discovery

Secondary metabolites or natural products have been isolated from many marine and terrestrial organisms. Bioactive secondary metabolites or natural products have long been known to be important biochemical constituents in marine organisms. While most chemical defense research has focused on tropical species exposed to high competition and predation pressures (Bolser and Hay 1996; Cronin et al. 1997), recent studies indicate that secondary metabolite defenses are prevalent in a variety of temperate, tropical, and polar marine species (Cetrulo and Hay 2000; Becerro et al. 2003; McClintock et al. 2004). Because these biomolecules have been naturally selected for their toxic/adverse effects on other organisms, they represent a broad potential pool of chemicals that could be useful as pharmacological agents.

In the late 1980s to 1990s, there was a significant movement of research devoted to isolating and structurally identifying marine bioactive metabolites (Faulkner 1991, 1992). Along with the increased interest in marine metabolite discovery came the need to understand metabolite synthesis pathways and to develop new methods for the total synthesis of potentially useful compounds (Pietra 1985; Hiersemann and Helmboldt 2005; Kibayashi 2005). Two recent reviews compile a group of marine natural products undergoing clinical trials in the United States (Chin et al. 2006; Fenical 2006), with 23 compounds in active Phase I to Phase III treatment assessments, according to the National Cancer Institute, Bethesda, MD. However, despite the diversity of potential bioactive agents that are presently described, only one compound (ziconotide-a peptide derived from the cone snail Conus magnus) has been approved as a drug for use in humans in both Europe and the United States. Ziconotide, a ω-conotoxin, has the potential to be a valuable long-term pain medication, with strong analgesic properties achieved by blocking calcium-dependent pain signals in the spinal cord (Chin et al. 2006; Klotz 2006). The relatively small number of marine compounds in use as pharmacological agents may be due to several factors. First, there is greater difficulty in collection of marine organisms compared to their terrestrial counterparts, which slows progress toward metabolite characterization and pharmacological development (de Vries and Beart 1995). Second, a lack of pharmacological understanding of the mechanisms of action of new marine-derived compounds necessitates additional research and target-driven assays for any metabolites with drug potential (de Vries and Beart 1995). In addition, after a metabolite is isolated and elucidated, medicinal chemistry synthesis programs are sometimes necessary to produce molecules with essential biological and chemical characteristics to be a good drug development candidate (Turner 1996). This synthesis process is often quite time consuming. Lastly, there is a lack of ethno-medical history for marine metabolites as compared to terrestrial derivatives, and such information does speed up the drug development process (Amador et al. 2003).

However, over the last decade, the application of genomic technologies to marine organisms has been opening up new avenues of research (Hofmann et al. 2005) at molecular and biochemical levels that have been so far been limited to just model species (i.e., species for which large genomic sequence databases are available). The application of genomic tools and resources for non-model marine organisms is steadily being developed to identify and describe the genetic components involved in metabolite pathways (Fortman and Sherman 2005). For example, steady progress has been made in the study of the polyketide pathway and associated metabolic genes involved in the biosynthesis of bryostatin, an anticancer agent derived from bryozoan symbionts (Hildebrand et al. 2004). In addition, study of polyketide synthase genes in marine-derived actinomycetes have also contributed to our knowledge of biosynthetic pathways and allowed for the production of novel and potentially useful natural products (Moore et al. 2005). These advancements are based on relatively simple and consolidated microbial genomes; the level of difficulty in isolating specific biosynthetic genes appears to increase when studying more complex and diffuse metazoan genomes (Piel et al. 2004). Thus, a more robust tool that can rapidly screen cDNAs for specific biosynthetic pathways of interest is necessary. Although it is clear that there is a long road between finding a bioactive natural product and the development of an effective drug therapy, a better understanding of metabolite biosynthesis will help pharmacological and ecological characterization of metabolites of interest. Biosynthetic pathway and regulation data are needed to determine if large-scale production of a bioactive compound would be feasible for drug development. Further, knowledge about whether the biosynthesis pathway is simple or complex (i.e., the number of genetic elements involved) and how the biosynthesis pathway is regulated (i.e., induction sensitivity and time course) would be very helpful in characterizing bioactive metabolites and prioritizing them for further study. An approach to assess rapidly the number of genes involved in the production of a bioactive compound and the environmental conditions that regulate the induction or repression of those genes involved in the biosynthesis pathway would complement metabolite characterization efforts and shed light on both medical and ecological roles of the metabolite of interest. For such an assessment of describing the molecular controls, we do not need to quantify the expression levels of individual genes, nor even identify the specific genes involved in the process. We just need to be able to describe quantitatively patterns of gene expression as they are related to the biosynthesis of a metabolite. This idealogical approach removes the necessity of having a genomic-level resource available [expressed sequence tag (EST) libraries or microarray cDNA probes] and opens up the possibility of working with a broad range of non-model marine organisms. An understanding of how secondary metabolites are produced and controlled is vitally important for assessing the potential production and yield of natural products used in drug development research (Belarbi et al. 2003; Fenical 2006). Applying new molecular and genomic techniques will begin to accelerate the pace at which we can identify the biosynthesis pathways and functional regulation of bioactive metabolites (König et al. 2006).

Feeding Deterrents Against Herbivory and Predation

In terrestrial plants, secondary metabolite production has been well studied. Herbivory by insects is one of the most common inducers of secondary metabolite production in plants (Karban and Baldwin 1997; Mercke et al. 2004; van Dam et al. 2005). Inducible defenses are defined as mechanisms that change following damage or stress and decrease the negative fitness consequences of predatory attacks on an organism (Karban and Baldwin 1997). Thus, in terrestrial systems, these inducible secondary metabolites often take the form of feeding-deterrent compounds such as phenolics, furanocoumarins, terpenes, alkaloids, hydrocarbons, proteinase inhibitors, and others (Schultz and Baldwin 1982; Baldwin 1988; Van Alstyne 1988; Bernays 1989; Farmer and Ryan 1990; Zangerl 1990; Farmer and Ryan 1992; Zangerl and Rutledge 1996; Karban and Baldwin 1997). Significant increases in the biosynthesis of such toxins in response to herbivory is well documented in terrestrial plants (Baldwin 1988; Zangerl 1990). In addition, volatile organic compounds (e.g., jasmonates) have been implicated in defense by stimulating production of proteinase inhibitors (PIs) or by attracting predators/parasites to the attacking species (Farmer and Ryan 1992; Dobson and Bergstrom 2000; Mercke et al. 2004; Reddy and Guerrero 2004; van Dam et al. 2005). With plant secondary metabolite production well established, recent research has focused on integration of genomics and metabolite production with new advances in mass spectrometry and data processing, microarray, and subtractive hybridization (Mercke et al. 2004; Keurentjes et al. 2006).

As in terrestrial systems, herbivory and predation are major factors influencing secondary metabolite production and inducible defenses in the marine realm. While many secondary metabolites from marine organisms have been isolated and characterized (Faulkner 2000, 2002; Blunt et al. 2003, 2004, 2005, 2006), much less is known about these defenses in marine systems than in terrestrial systems. This may be due to the fact that the field of marine chemical ecology (i.e., the study of marine-derived bioactive compounds and their ecological roles) is relatively new, having only been established in the last 20 years (Fenical 2006), and thus fewer studies of induced defenses in marine systems exist relative to those regarding terrestrial systems. Still, more than 17,000 natural products have been isolated from diverse marine phyla (MarinLit 2006). More than 700 new compounds were discovered in 2004 alone (Blunt et al. 2006). As more secondary metabolites are isolated and characterized, marine chemical ecology studies have switched focus from simple metabolite discovery to investigating the role of specific predators, biosynthetic pathways and enzymes and energetic costs on secondary metabolite production and defense induction (Hay 1996; Dworjanyn et al. 2006; Thoms et al. 2006). Inducible secondary metabolite defense in an aquatic, nonvascular plant was first reported by Van Alstyne in 1988. She found that Fucus distichus increased its level of defensive polyphenolic compounds by approximately 20% over control plants when experimentally damaged (Van Alstyne 1988). Later, several other species of brown algae also demonstrated induction of polyphenolic compounds in response to artificial predation, designed to mimic observed field predation (Hammerstrom et al. 1998).

However, a main difficulty in studying secondary metabolite production in marine organisms is the fact that while many studies of marine organisms suggest that induction of marine secondary metabolites in response to predation does occur, most have not directly linked metabolite production and predation, and others have failed to prove that induction occurs (Steinberg 1994; Pavia et al. 1997; Slattery et al. 2001; Toth and Pavia 2002). The next step to better understanding induced defenses and secondary metabolite production is to link environmental and ecological data (i.e., predation level, water conditions, etc.) to quantitative secondary metabolite production and gene expression data. To do this, a better understanding of secondary metabolite biosynthesis is needed. Because a large portion (approximately 40%) of marine organism-derived metabolites are terpenes (Harper et al. 2001; MarinLit 2006), a logical place to begin investigation is with the mevalonate or methylerythritol phosphate pathway. The mevalonate or isoprenoid pathway is a very common terpene biosynthesis pathway. One of the central molecules in this pathway is 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA), a precursor to mevalonate. Mevalonate is phosphorylated to form isopentynyl pyrophosphate (IPP) via several steps and 3 ATP molecules. These IPP molecules are the active unit in terpene biosynthesis. Thus, HMG-CoA, mevalonate, and IPP may be useful gene targets in the pathway. Focusing in on these three molecules by examining gene expression changes may ultimately help us better understand how secondary metabolite biosynthesis is triggered and regulated. Alternatively, isoprenoid units may be synthesized via the methylerythritol phosphate (MEP) pathway. This pathway has recently been examined in bacteria, algae, plants, and a malarial parasite, and consists of seven enzymatic steps leading to the formation of the isoprenoid precursors, IPP (Proteau 2004). Therefore, MEP pathway intermediates [i.e., 1-deoxy-d-xylulose 5-phosphate (DXR) and 2-C-methyl-d-erythritol 4-phosphate (MEP)] may also be useful gene targets in understanding terpene biosynthesis.

Biosynthetic pathways for marine natural products may also be elucidated by looking toward techniques used to study bacterial metabolite biosynthesis pathways. Most biosynthesis pathway work to date has focused on terrestrial bacteria and a few marine bacteria (Fortman and Sherman 2005). Metagenomics, the study of multiorganism-derived pools of genomic DNA, has been used to clarify bacterial pathways and may prove useful for elucidating secondary metabolite pathways in marine invertebrates (Wang et al. 2000; Brady et al. 2001; Fortman and Sherman 2005). Further, structural similarities between bacterial metabolites and metabolites from marine sources and the occurrence of bacterial symbionts in marine organisms suggest that understanding bacterial biosynthesis pathways may greatly increase our knowledge of marine secondary metabolite production (Piel 2004; Moore 2006).

Subsequently, studies need to link observed levels of secondary metabolites and predation with gene expression changes in potential biosynthesis pathway intermediates. This has proved difficult with marine organisms because molecular methods often require optimization for use with “non-model” organisms. As in plants, additional steps in standard RNA/DNA extraction protocols are needed when performing extractions on marine organisms, especially marine invertebrates, owing to higher levels of salts and polysaccharides (Hong et al. 1997; Barnwell et al. 1998). In addition, sequence data availability for marine organisms is often sparse, particularly in comparison with what is accessible for terrestrial organisms. Common transcriptome profiling and gene expression methods, such as microarray, serial analysis of gene expression (SAGE), or massively parallel signature sequencing (MPSS), require a priori knowledge of sequence data to examine changes in gene expression. Further, the aforementioned methods are time intensive and/or costly (Ruan et al. 2004).

Transciptome Profiling of Gene Expression Patterns

In the field of functional genomics with natural populations of non-model organisms, there is a distinct need for new methodological approaches that would allow for rapidly and efficiently assessing differences in mRNA transcriptome pools (Feder and Mitchell-Olds 2003; Hofmann et al. 2005). The cost and expense in time, effort, and resources to conduct studies utilizing the dominant technologies are prohibitive for exploratory work with organisms for which genomic-level resources are not already established (e.g., expressed sequence tags, serial analysis of gene expression, microarray hybridizations, massively parallel signature sequencing, 454 signature sequencing). To answer this need, a novel methodology based on reannealing kinetics has been developed for the rapid, high-throughput, efficient, and economical profiling of the sequence complexity of a transcriptome sample (Marsh and Fielman 2005).

Thirty years ago, reannealing kinetics (C o t curves) were developed for indirectly measuring the sequence complexity of DNA extracts (Davidson and Britten 1973; Galau et al. 1974, 1977; Anderson et al. 1976; Chamberlin et al. 1978). The approach is conceptually simple and is based on the probability that a single-stranded DNA molecule will hybridize with its complementary strand at a rate that is determined by the abundance of that sequence-specific DNA molecule within a total DNA extract. C o t refers to the product of annealed DNA concentration (C o) and time (t), which was plotted on a log scale against percent renaturation (determined by hydroxyapatite chromatography) to form a sigmoidal C o t curve from which kinetic parameters were extracted (Nayak et al. 1991; Verneau et al. 1991). Reannealing reactions that utilize RNA sequence-molecules are referred to as R o t assays. From either a C o t or R o t kinetic profile, genome or transcriptome size, repetitive sequence/message abundance, and the total sequence complexity of a nucleic acid pool can been determined (Peterson et al. 2002a, b; Fielman and Marsh 2005; Marsh and Fielman 2005; Hoover et al. 2006).

A traditional C o t or R o t profile is run at one annealing temperature (generally about 60°C) and the time points at which changes in the reannealing profiles arise are used to separate between the relative compositions of highly repetitive, moderately repetitive, and nonrepetitive sequences in a nucleic acid sample. In contrast, the transcriptome kinetic profiles developed by Marsh and Fielman (2005) utilize a programmable thermal block to lower the temperature of a sample at 1°C intervals with a 30- to 45-min hold at each temperature. An intercalating, double-stranded DNA fluorescent dye (such as PicoGreen, Molecular Probes) is used to directly monitor the amount of double-stranded DNA that accumulates at each temperature interval. By using these small temperature steps, the kinetic classes present in a sample can be finely sorted into 30–35 different T m groups (i.e., melting temperature thresholds). At any one temperature, the rate of reannealing is determined by the probability that complementary strands collide, align, and form stable hydrogen bonds at that temperature within the reaction buffer.

For example, consider the two reannealing curves plotted for cDNA pools from two individuals of the south Pacific soft-coral, Sinularia polydactyla (Figure 1; data from Hoover et al. 2006). One individual was in a treatment group that was wound-stressed (grazing predation; day 10) and the other was in a control group at the start of the experiment (day 0). The stressed individual at day 10 exhibits reannealing kinetics with a slower and more protracted completion period relative to the annealing curve of the day 0 individual. This indicates that the probability of complementary strands colliding is lower in the stressed cDNA pool than it is in the control cDNA pool. Essentially, we can conclude that there is a greater level of nucleic acid sequence “diversity” in the stressed sample at this annealing temperature because there is a lower probability that any two complementary strands will collide (i.e., there are more competing strands present).

Figure 1
figure 1

Reannealing curves.

The examples illustrated in Figure 1 present annealing curves that can be described by second-order rate functions because of the dynamics associated with duplex formation and subsequent intercalation of the fluorescent dye:

$${\text{flr}}{\left( t \right)} = \alpha + \beta * {\left( {1 - e^{{ - \gamma t}} } \right)}$$
(1)

where α is flr at time 0, β is the maximum change in flr when reannealing is complete, and γ is a second-order rate constant. In Figure 2, the reannealing curve is plotted for the same wound-stress sample in Figure 1, with the gray points showing the FLR data and the black line fitting a second-order regression to that data (Eq. 1). These kinetic data provide parameter estimates of the total amount of nucleic acids (β) and the distribution of sequences within that population of nucleic acids (γ), both of which provide vital information about the transcriptome pool. A more detailed kinetic profile can be constructed by performing these assays at multiple temperature plateaus. By denaturing a sample at high temperature (>95°C) and then sequentially lowering the temperature at 1°C intervals with a 30- to 45-min incubation at each temperature, a detailed series of kinetic profiles can be obtained for each sample (Figure 3). Thus, each temperature plateau generates a set of kinetic parameter estimates that are dependent on the abundance and sequence complexity of those nucleic acid molecules with the corresponding T m.

Figure 2
figure 2

Sequential reannealing curves to generate kinetic profile.

Figure 3
figure 3

Regression model for estimating kinetic parameters.

Essentially the regression model (Eq. 1; Figure 2) fitted to a fluorescence annealing curve represents a probability distribution of how quickly individual sequences in that T m class can reanneal. Expressing this distribution as a percentage of the total molecules in a T m class, we can condense the regression model parameters into a single estimate of sequence complexity. Here, quantitative entropy (organization) statistics, such as the Shannon-Weaver entropy index (Prus et al. 1999; Shpak and Churchill 2000; Marsh et al. 2006), can provide a simple and direct measure of the distribution and abundance of unique nucleic acid sequences in a transcriptome sample at each T m class (Marsh and Fielman 2005):

$$H_{{{\text{flr}}}} = - F_{{\max }} * \Sigma p_{i} * \log _{2} {\left( {p_{i} } \right)}$$
(2)

where p i is the proportional contribution of each ith time interval and F max is the maximum fluorescence reached (β from Eq. 1). Thus, the estimate of sequence entropy (H flr) incorporates information on both the distribution and abundance of transcripts within each kinetic T m class. Although this H flr statistic is just a relative measure of sequence diversity, it does allow for a direct, quantitative comparison among different mRNA samples to identify the magnitude of changes in the transcriptome pool between experimental treatments or over time course studies.

Combing data from all the individual temperature plateaus, a quantitative “profile” of an mRNA sample (after conversion to cDNA) can be generated. In Figure 4, the top panel presents the total quantity of cDNA that reannealed at each Tm for the samples shown in Figure 1. The F max parameter is derived from β in Eq. 1 and indicates the asymptotic maximum fluorescence towards which the regression model is approaching at each temperature interval. This maximum fluorescence value is a relative measure of the total double-stranded DNA in the sample at the end of the incubation period. The log scale visually reduces the apparent difference between the D0 and D10 samples; however, there is an approximately 5-fold difference in the total cDNA in the 64–65°C T m class, indicative of a large change in the representation of transcripts among those treatment samples.

Figure 4
figure 4

Profile of total cDNA and H flr at each T m.

The bottom panel in Figure 4 presents the transcriptome entropy profiles (Eq. 2). The basic interpretation of the curves is not necessarily intuitive to most biologists. The peak in H flr at 64–65°C in the D10 sample results from two contributing processes. First, there is the amount of cDNA that is reannealing (F max). Second, there is a decrease in the probability that complementary strands of cDNAs will reanneal (mathematically, the greater the equivalence between p i values, the higher the entropy statistic). Although we estimate these processes with just relative measures, the H flr statistic that is generated is extremely practical as a comparative measure for assessing differences in mRNA sequence pools among samples. In Figure 4 there is a significant difference in the distribution and abundance of transcripts in the 64–65°C T m class between the D0 and D10 treatments (see (Hoover et al. 2006). The H flr estimate is not an end in itself, but primarily serves as a means for comparative evaluations of the transcriptome in samples collected from different experimental treatments or environmental habitats.

Integrating Metabolite and Transcriptome Profiles

The primary advantage of the transcriptome kinetic profile is that it allows for the rapid and high-throughput analysis of mRNA samples in a 96-well microtiter plate format. This establishes approximately the same screening capacity that is possible with HPLC profiling of secondary metabolites. Consequently, one now has the ability to screen a large number of individuals and a large number of treatment conditions quickly and efficiently to identify correlative patterns in both gene expression responses and secondary metabolite biosynthesis. Given a large number of unknown metabolites in any chromatogram, one could quickly identify those that exhibited the greatest linkages to concomitant changes in the transcriptome pools.

Slattery et al. (2001) identified two populations of soft corals in Guam with site-specific defensive chemistry levels. Higher levels of defensive chemistry were correlated to increased predation, and soft corals transplanted between the sites acquired the chemical profiles of the new site. While this study didn't demonstrate inducible defenses, we were able to utilize this model system to examine the biosynthesis of a feeding deterrent compound in response to predation-mediated wound stress by transcriptome profiling. These two levels of biological data have been successfully integrated to identify a significant correlation between the entropy statistic for some of the T m classes of cDNA and the biosynthesis of 11β-acetoxypukalide in wound stressed tissues (Hoover et al. 2006). In this study, increased levels of both natural predation by butterfly fish and artificial wound-scrapes resulted in significant transcriptome profile changes and in significant increases in the defensive secondary metabolite, 11β-acetoxypukalide. Using principle component and discriminate function analyses, Hoover et al. (2006) were able to identify just a few T m classes in the transcriptome profile that were significantly correlated with the tissue levels of 11β-acetoxypukalide. And conversely, they were able to conclude that many of the HPLC chromatogram peaks were not quantitatively related to any correlational changes in the transcriptome profiles. This linkage between molecular and biochemical phenotypes could serve as an important information tool when presented with a sample having a diverse suite of unknown secondary metabolites and needing to decide which selective few should be further isolated and characterized. For example, a crude secondary metabolite extract may yield many peaks during HPLC analysis. Integration of transcriptome and secondary metabolite data can uncover correlations of some peaks to say, a particular environmental stress, thereby eliminating the need to pursue peaks that are not related to the question being investigated.

As a screening tool, Hoover et al. (2006) were able to evaluate a total of 50 different soft coral samples, with replicate RNA extractions and cDNA syntheses of each and four duplicate profile assays run on each extract. Thus, this new method provided the Hoover et al. study with the ability to run a total of 400 mRNA profile assays for just this one experimental project. The second great value of this methodology is that there is a large degree of statistical power that is possible when analyzing such high numbers of samples. An N of 50 provides for a high degree of statistical resolution.

To truly understand secondary metabolite biosynthesis and regulation in marine organisms, ultimately the mRNA transcripts representing the changes in these transcriptome profiles must be isolated, cloned, and characterized. Right now, the fluorescence-based assay described in the preceding text relies on reannealing kinetics to provide a quantitative discrimination between samples. Thus, transcriptome pool changes are discernible, but the individual transcripts responsible for that change remain unknown. Future work should aim to isolate transcripts of interest in order to pinpoint the functional genes involved in the gene expression difference. In the case of secondary metabolite production in soft corals, isolating these transcripts will hopefully help us find the gene or genes responsible for defensive secondary metabolite production.

In addition, there is a need for more integration of metabolite and genomic data. Some genomics-based approaches have been recently employed to better our understanding of the reactions of non-model organisms to environmental changes, including those of marine organisms (Lopez et al. 2002; Gracey et al. 2004; Thomas and Klaper 2004; Hofmann et al. 2005). However, we should continue to look toward terrestrial chemical ecology studies as some integration of secondary metabolite data and genomic-based methods have been implemented (Wang et al. 2000; Brady et al. 2001; Goossens et al. 2003). Combining knowledge of metabolite biosynthesis, structure, and function with genomic data may allow for a better understanding of many marine ecological interactions (e.g., predator-prey interactions, competition for space). In addition, a better understanding of biosynthesis may allow us to produce large quantities of biomedically valuable compounds with a minimal impact on natural populations of marine organisms.