Introduction: the long road to unwinding alkaloid biosynthesis

About 20% of flowering plant species produce alkaloids and display an enormous structural diversity therein, despite they are practically all derived from a restricted number of universal amino-acid precursors. Since the discovery of morphine, two centuries ago, more than 12,000 alkaloids have been isolated (Facchini 2001). Biosynthesis of morphine but also of other well-known alkaloids such as nicotine or vinblastine has been subjected to extensive research efforts, and consequently many of the enzymes involved in the biosynthesis of these three alkaloids have been identified. Notwithstanding, for each of them the biosynthetic pathway still remains incompletely characterised.

Elucidation of alkaloid biosynthetic pathways has been facing many difficulties, mainly because of the low concentrations of alkaloid biosynthetic enzymes in plant tissues producing them and the limited availability of radiolabelled substrates or cofactors for the development of sensitive enzyme assays. Classically, genes encoding alkaloid biosynthetic enzymes were isolated through knowledge of peptide sequences determined from purified enzymes. Obviously, due to the low concentrations many alkaloid enzymes may be recalcitrant to conventional purification strategies. In an alternative strategy, biochemical purification is avoided and candidate clones are isolated by homology-based screening, often in combination with expression pattern analysis. Thereafter, the isolated genes are expressed in heterologous systems such as bacteria or yeast, and the recombinant proteins tested for the expected enzyme activities. An important breakthrough was achieved by the generation of plant cell suspension cultures with increased rates of alkaloid biosynthesis, which moreover could often receive additional stimuli by the addition of phytohormones or elicitors of biotic or abiotic nature. These research approaches allowed the discovery of tens of new enzymes catalyzing steps in the biosynthesis of diverse alkaloids, including those of the isoquinoline, terpenoid indole, tropane and purine classes (for reviews see Croteau et al. 2000; Facchini 2001; De Luca and Laflamme 2001; Hashimoto and Yamada 2003; Facchini et al. 2004; Facchini and St-Pierre 2005). For instance it led to the full characterization of the berberine biosynthetic pathway, the first alkaloid biosynthetic pathway to be completed. Elucidation of many other alkaloid biosynthetic pathways is still ongoing, with some of them, such as of morphine, approaching completion.

Functional genomics, on the advent of a new breakthrough

For the majority of alkaloids or other secondary metabolites, thorough knowledge of the whole biosynthetic pathway or detailed understanding of the regulatory mechanisms controlling the onset and flux of alkaloid biosynthesis is still lacking. The next important breakthrough in alkaloid biosynthesis research may be prompted by the development of powerful functional genomics tools, allowing comprehensive investigations of biological systems at an accelerated pace (Oksman-Caldentey and Inzé 2004). Indeed, using genomics, we can now identify all genes in a plant; and using transcriptomic and proteomic techniques we can resolve which genes are (in)activated in a particular plant cell or organ under a particular growth or stress condition. Connecting genomics, transcriptomics or proteomics with metabolomics by a systems biology approach may then allow exploring the biochemical machinery of plants, ultimately leading to the discovery of new pathways, the modelling of metabolic networks and predictive metabolic engineering. A crucial prerequisite is of course the availability of gene sequences, either of a complete genome, which is the case for only a few model plant species, or of expressed sequence tag (EST) databases. Given that currently for most medicinal plants extensive sequence resources are not available, only a limited number of such comprehensive studies of complex metabolic systems have been reported so far. This number is however expected to increase rapidly in the coming years.

One of the first and to-date most extensively applied functional genomics approaches to isolate genes implicated in plant secondary metabolism involved random sequencing of cell-specific cDNA libraries. In this approach computational and expression-based analyses are combined, based on the rationale that cDNAs corresponding to genes involved in the biosynthesis of a particular metabolite will be abundant in a cDNA library made from the tissue in which the particular metabolite is produced and thus the ‘biosynthetic’ genes are expressed. One of the first studies reporting the use of this method showed how random sequencing of a peppermint oil gland secretory cell cDNA library could be used to identify the pathway responsible for menthol biosynthesis (Lange et al. 2000). Other successful research exploiting this technological approach, include reports on the isolation of genes involved in taxol biosynthesis in Taxus cuspidata cells (Jennewein et al. 2004) and artemisinin biosynthesis in Artemisia annua trichomes (Teoh et al. 2006).

In general, the same rationale is followed when combining any transcriptome profiling method with metabolic profiling (Yamazaki and Saito 2002; Oksman-Caldentey and Saito 2005; Fridman and Pichersky 2005). For instance, taxol biosynthesis was further dissected using differential display analysis of gene expression in jasmonate-treated Taxus cuspidata cells (Schoendorf et al. 2001). Differential display was also successfully used to isolate anthocyanin biosynthetic genes by comparing red and green forms of Perilla frutescens (Saito et al. 1999; Yamazaki et al. 1999, 2003). Genes involved in the formation of volatile compounds in strawberry (Aharoni et al. 2000) and rose (Guterman et al. 2002) have been identified by using DNA microarray analysis in combination with targeted analysis of volatile metabolites. The same rationale holds as well when proteome analysis is combined with metabolite profiling. Proteomic analysis of different fractions of poppy latex allowed for instance the isolation of two novel poppy alkaloid biosynthetic genes (Ounaroon et al. 2003).

Pioneering work on a functional genomics approach—in an era where the term ‘functional genomics’ still had to be invented—to study alkaloid pathways was published in 1994 (Hibi et al. 1994). This study described the very first attempt to isolate genes involved in alkaloid—nicotine from common tobacco—biosynthesis by a differential screening approach. Since then integrated functional genomics approaches aimed at elucidation of alkaloid biosynthetic pathways have, to our knowledge, been undertaken for three plant species; the opium poppy (Papaver somniferum), common tobacco (Nicotiana tabacum) and the Madagascar periwinkle (Catharanthus roseus). In this review, we discuss the state-of-the-art of knowledge of alkaloid biogenesis, and in particular the contributions made therein by functional genomics-based research, for these three plant species.

Benzylisoquiniline alkaloids: Papaver somniferum

Benzylisoquinoline alkaloids (BIAs) are a large and diverse alkaloid group with over 2500 defined structures (Facchini 2001). BIAs are derived from l-tyrosine, and biosynthesis begins with the condensation of dopamine and 4-hydroxyphenylacetaldehyde by norcoclaurine synthase (Fig. 1). (S)-Norcoclaurine is then, through a number of enzymatic reactions, converted to the central branch-point intermediate (S)-reticuline. At this point, the pathway can branch to the production of morphinans (e.g. thebaine, codeine, morphine) or to the production of sanguinarines (Fig. 1). Papaver somniferum, the opium poppy, synthesises more than 80 alkaloids, and many poppy biosynthetic enzymes, active in multiple cell types, and the corresponding genes have been identified (Facchini 2001; Kutchan 2005).

Fig. 1
figure 1

Structure and biosynthesis of benzylisoquinoline alkaloids in Papaver somniferum. Metabolites are given as full names in lowercase and enzymes as abbreviations in capitals. Full and dashed arrows mark single and multiple conversion steps between intermediates, respectively. Enzymes listed: BBE (berberine bridge enzyme), CFS ((S)-chelanthifoline synthase), CNMT (S-adenosyl-l-methionine:norcoclaurine 6-O-methyltransferase), COR (codeinone reductase), CYP719A1 ((S)-canadine synthase), CYP80B1 ((S)-N-methylcoclaurine 3′-hydroxylase), DBOX (dihydrobenzophenanthridine oxidase), DRR (1,2-dehydroreticuline reductase), DRS (dehydroreticuline synthase), MSH ((S)-cis-N-methylstylopine 14-hydroxylase), NCS (norcoclaurine synthase), 4′OMT ((R,S)-3′-hydroxy-N-methylcoclaurine 4′-O-methyltransferase), 6OMT (norcoclaurine 6-O-methyltransferase), 7OMT ((R,S)-reticuline 7-O-methyltransferase), P6H (protopine-6-hydroxylase), SAT (salutaridinol 7-O-acetyltransferase), SOMT (scoulerine N-methyltransferase), SOR (salutaridine:NADPH 7-oxidoreductase), SPS ((S)-stylopine synthase), STOX ((S)-tetrahydroprotoberberine oxidase), STS (salutaridine synthase), TNMT (tetrahydroprotoberberine-cis-N-methyltransferase), TYDC (tyrosine/dopa decarboxylase)

The first report on a functional genomics approach to study poppy alkaloid pathways described a two-dimensional SDS-PAGE based proteomics approach (Decker et al. 2000). In this study, poppy latex was separated into different fractions: the cytosolic serum and the vesicle sediment, the latter further subsectioned by density centrifugation. For each fraction the soluble proteins were separated by two-dimensional SDS-PAGE and their sequences analysed by internal peptide microsequencing. Codeinone reductase, a representative of the morphinan-specific branch (Fig. 1), could be detected within the cytosolic serum fraction. Furthermore, 75 serum-specific protein spots were analysed and for 69 of them, internal peptide microsequencing, followed by homology-based database searching, allowed assigning a putative function. Similarly, for 23 vesicle-specific protein spots a putative function could be assigned. In an extension of this work, proteomic analysis of the latex allowed isolating two novel alkaloid biosynthetic genes: (R,S)-reticuline 7-O-methyltransferases and norcoclaurine 6-O-methyltransferase (Ounaroon et al. 2003; see also Fig. 1). By microsequencing of protein spots in the size range expected for plant methyltransferase monomers, the authors were capable of isolating gel spots whose partial amino acid sequences were homologous to those of plant O-methyltransferases. Cloning of the corresponding genes and functional analysis of the recombinant proteins in a heterologous system established that the isolated genes indeed encoded O-methyltransferases and were involved in poppy alkaloid biosynthesis.

Very recently, two sequencing projects from P. somniferum seedlings were initiated (Ziegler et al. 2005; Millgate et al. 2004). In the first, an AFLP-like sequencing approach was employed to obtain 849 ‘UniGenes’ that were printed on macroarrays (Ziegler et al. 2005). This gene resource was then employed in a comparative macroarray-based expression analysis between morphine-containing P. somniferum plants and eight other, morphine-free, Papaver species and led to the identification of a novel O-methyltransferase involved in benzylisoquinoline biosynthesis, i.e. S-adenosyl-l-methionine:(R,S)-3′-hydroxy-N-methylcoclaurine-4′-O-methyltransferase (see Fig. 1). The second P. somniferum gene resource consists of a 17,000-gene poppy microarray, printed with ESTs derived by random sequencing from a P. somniferum shoot cDNA library (Millgate et al. 2004). This microarray was used for global expression analysis of the poppy top1 mutant, which does not complete morphinan biosynthesis into morphine and codeine but instead accumulates the precursors thebaine and oripavine (Millgate et al. 2004). This expression analysis allowed the identification of 10 genes, significantly differentially underexpressed in top1, seven of which were highly homologous to known genes of other species. None of these genes has however been proved to be responsible for the top1 phenotype and their potential role in the morphine biosynthesis pathway is not yet understood. Nevertheless, these kinds of resources clearly open new avenues for elucidating the poppy alkaloid metabolic network and, eventually, modifying it.

Interestingly, a virus-induced gene silencing (VIGS) approach for P. somniferum has been developed recently (Hileman et al. 2005). VIGS was originally established as a Solanaceae-specific approach for determining the biological function of gene products and takes advantage of the fact that plants induce homology-dependent defence mechanisms in response to virus attacks. VIGS results in the degradation of endogenous RNA with extensive sequence complementarity to for instance exogenous transgenes introduced into virus-based vectors (Ruiz et al. 1998). The extension of the VIGS method to a non-Solanaceae species, although still in an early developmental phase, may be a promising tool to perform large-scale functional screenings of gene tags from medicinal plants, for instance of the ESTs generated in the P. somniferum sequencing projects mentioned above.

A recent study (Allen et al. 2004) confronted us with our current limited understanding of (poppy) alkaloid biosynthesis. Allen and coworkers used RNA interference to successfully block codeinone reductase expression (see Fig. 1), and a substantial reduction of morphine content was indeed observed in transgenic poppies. Surprisingly however, instead of an increased accumulation of immediate morphine-type precursors normally present in the latex, the transgenic poppy latex accumulated rare alkaloids, including the upstream precursor reticuline, located seven enzymatic steps before the codeinone reductase-mediated reaction in the biosynthesis pathway. The reasons for these unexpected results and the drastic switch in the alkaloid pattern were not clear. These findings clearly underscored the argument that knowledge of gene sequences and biosynthetic enzymes alone is not sufficient to design a rational approach towards alkaloid pathway modification and that in this regard understanding of the entire metabolic and cellular system is required.

Systems biology integrates gene expression sensu latu and therefore goes beyond the transcriptome and proteome. For instance, when investigating metabolic networks, systems-level data on metabolite accumulation (metabolomics), metabolite transport and metabolic flux should be included. But systems biology also may incorporate data from e.g. the promoterome, the genome-wide study of promoter activities or the localisome, the genome-wide analysis of protein localisation in intracellular structures or different organs or tissues. Obviously, due to the present lack of extensive full-length gene collections and efficient plant transformation protocols, for alkaloid producing medicinal plants this cannot yet be performed on a genome-wide scale. Notwithstanding, in the case of morphine biosynthesis in the opium poppy, the road is being paved to achieve this tantalising goal. In situ hybridisation of known alkaloid transcripts in poppy reflects the promoterome and immunocytochemical localisation of several known alkaloid biosynthetic enzymes confirmed the existence of an extensive compartmentalisation of morphinan biosynthesis (Bird et al. 2003; Weid et al. 2004). It awaits exploitation of high-throughput poppy transformation protocols to extend this kind of research to a genome-wide scale, by which tagged genes can be employed to verify localisation (e.g. by means of GFP tagging) or protein–protein interactions (e.g. by yeast two-hybrid or tap tag technology). Correspondingly, successful protocols for Agrobacterium tumefaciens-mediated transformation of opium poppy have been developed (Park and Facchini 2000; Chitty et al. 2003).

Nicotine: Nicotiana tabacum

Secondary metabolism in tobacco is quite well studied and an excellent review was published by Nugroho and Verpoorte (2002). More than 2500 compounds have been identified in tobacco, with the most prominent group formed by the alkaloids, including nicotine but also other compounds such as nornicotine, anabasine, anatabine and anatalline. The biosynthetic origin of nicotine begins with the plant polyamine putrescine, which can be formed directly by decarboxylation of ornithine or indirectly from arginine in a reaction initiated also by a decarboxylase (Fig. 2). Putrescine in tobacco can serve as a precursor in the biosynthesis of higher polyamines or can be converted to N-methylputrescine by the action of putrescine N-methyltransferase (PMT), the first committed step in nicotine biosynthesis (Hashimoto and Yamada 1994). In other Solanaceae species, but not tobacco, N-methylputrescine can also be further converted to tropane alkaloids, such as hyoscyamine and scopolamine, or calystegins.

Fig. 2
figure 2

Structure and biosynthesis of pyridine alkaloids in Nicotiana tabacum. Metabolites are given as full names in lowercase and enzymes as abbreviations in capitals. Enzymes listed: ADC (arginine decarboxylase), AIH (agmatine deiminase), AP (aspartate oxidase), AS (arginase), CYP82E4 (nicotine N-demethylase), LDC (lysine decarboxylase), MPO (methylputrescine oxidase), NCPAH (N-carbamoylputrescine amidohydrolase), ODC (ornithine decarboxylase), PMT (putrescine N-methyltransferase), QPRTase (quinolinate phosphoribosyltransferase)

As mentioned above, the very first report on a ‘functional genomics’ approach to study alkaloid biogenesis in tobacco was published in 1994 by Hibi and coworkers. They exploited the availability of N. tabacum cultivars with low nicotine contents, the Burley nic1 and nic2 mutants (Legg and Collins 1971). The Nic1 and Nic2 loci (also referred to as the A and B loci) are thought to be regulatory genes because the mutant alleles decrease nicotine accumulation and nicotine biosynthesis enzyme levels. Differential screening by subtraction hybridisation of cDNA libraries of wild-type and mutant cultured roots allowed isolation of two genes, A411 and A622, encoding PMT and a protein with homology to isoflavone reductase, respectively (Hibi et al. 1994). It has been speculated that A622 may encode the enzyme catalysing the condensation between the 1-methyl-Δ1-pyrrolinium cation and nicotinic acid to nicotine (see Fig. 1) but conclusive evidence to sustain this hypothesis has not been obtained yet. Similarly, the exact identity of the regulatory Nic1 and Nic2 genes still remains elusive, despite the attention they received.

Following this pioneering report, further genome-wide expression analysis related with tobacco alkaloid biosynthesis had to await the true breakthrough of functional genomics. The first follow-up to be reported was a characterisation of cDNAs differentially expressed in tobacco roots during the early stages of alkaloid biosynthesis by subtractive hybridisation screening (Wang et al. 2000). Sixty differentially expressed genes were isolated, amongst which many encoding known nicotine biosynthetic enzymes, but no novel gene products (directly) involved in alkaloid biosynthesis were identified within this set. In contrast, we launched a successful gene discovery program based on an experimental approach, in which alkaloid targeted metabolite profiling was combined with cDNA-amplified fragment length polymorphism-based transcript profiling of tobacco BY-2 cells treated with methyl jasmonate, an efficient elicitor of nicotine biosynthesis in tobacco (Goossens et al. 2003; De Sutter et al. 2005). The profiling data suggested an extensive jasmonate-mediated genetic reprogramming of metabolism; out of 20,000 visualized transcript tags a transcriptome of 459 jasmonate-modulated ESTs could be composed that was compared with the observed jasmonate-induced shifts in biosynthesis of tobacco metabolites (Goossens et al. 2003). The generated gene inventory revealed the presence of all, except one, of the genes known to be involved in the biosynthesis of nicotine alkaloids in Nicotiana species. Most of these were co-induced in time by methyl jasmonate in the elicited tobacco BY-2 cells and clustered together with novel genes or genes encoding proteins with still unknown functions, which were postulated to be probable candidates to encode missing links in nicotine biosynthesis, either at the structural or regulatory level. Indeed, in a rationally designed screen to discover potential regulators of genes coding for key enzymes in nicotine biosynthesis, we found two novel tobacco AP2-domain transcription factors, NtORC1 and NtJAP1, that positively regulate the PMT promoter (De Sutter et al. 2005). Simultaneously, by further exploration of the profiling data and gene inventory we could forward novel gene functions, not only with regard to jasmonate-mediated activation of alkaloid biosynthesis (De Sutter et al. 2005) but also with regard to the links between jasmonate and cell cycle-regulated cytokinin metabolism (Kwade et al. 2005) or the de novo biosynthesis of vitamin C (Wolucka et al. 2005). Another successful study employed a microarray-based strategy to identify genes that are differentially regulated between closely related tobacco lines that accumulate either nicotine or nornicotine as the predominant leaf alkaloid (Siminszky et al. 2005). Transcript profiling was achieved in three phases. Firstly, EST databases, each representing a compilation of more than  > 11,000 sequencing runs of randomly selected clones, were generated by using cDNA libraries from senescing leaves of two isogenic tobacco lines, accumulating either nicotine or nornicotine as the predominant leaf alkaloid. Secondly, two types of DNA chips were synthesized, one representing the nornicotine line (4,992 cDNAs), and another representing the complete non-redundant unigene set of the combined EST databases (6,963 cDNAs). Finally, hybridizations were conducted with RNA isolated from different tobacco genotypes that had been subjected to different treatments known to enhance the metabolic conversion of nicotine to nornicotine. These experiments led to the identification of a closely related family of tobacco CYP450 genes, of which at least one member, CYP82E4 encodes an enzyme with nicotine demethylase activity (Siminszky et al. 2005; see also Fig. 2).

An alternative functional genomics approach to investigate tobacco alkaloid biosynthesis was described recently (Rogers et al. 2003). In a way to access the genomic potential of a plant, the creation of a ‘functional genomics library’ was pursued by generating large populations of tobacco cell cultures, in which each clone represents a ‘gain of function’ mutation, in this case by means of T-DNA activation tagging. T-DNA activation tagging is a method used to generate dominant mutations in plant cells by the insertion of a T-DNA that carries constitutive enhancer elements that can cause transcriptional activation of flanking plant genes. Using this approach, one clonal culture (out of 10,000 screened) was obtained that overproduced nicotine. Intact plants regenerated from this clone continued to overproduce nicotine at fivefold higher levels than wild-type N. tabacum (Littleton et al. 2005). However, despite T-DNA tagging may allow rapid gene rescue, the exact nature of the gene responsible for the observed increase in nicotine biosynthesis was not revealed. Accordingly, subsequent cloning of the gene potentially responsible for this phenotype and reintroduction of the gene into wild-type plants is needed to confirm the phenotype. Still, this T-DNA activation tagging approach represents an attractive alternative and has for instance also been successfully applied to identify transcription factors regulating terpenoid indole alkaloid biosynthesis in Catharanthus roseus (van der Fits and Memelink 2000; van der Fits et al. 2001).

Lately, many more ‘omics’ tools and resources are being developed specifically for tobacco. We list here a number of them. First, a resource that in the near future will become available due to the efforts of the Tobacco Genome Initiative is the full N. tabacum genome sequence (http://www.tobaccogenome.org). This initiative is coordinated at the Plant Pathology Department of the North Carolina State University, is supported by Philip Morris USA, Inc., and aims to sequence and annotate more than 90% of the open reading frames in the genome of N. tabacum. Other genetic resources that are already publicly available include for instance large EST collections of N. sylvestris roots and leaves (representing more than 3,500 genes; Katoh et al. 2003) and N. tabacum BY-2 cells (representing about 7,000 genes; Matsuoka et al. 2004). In total, more than 450,000 ESTs are currently available for Nicotiana and other species of the Solanaceae family, allowing comparative transcriptome analysis across species (Rensink et al. 2005). Construction of N. tabacum protein databases has also been initiated, for instance of leaf trichomes and BY-2 cells (Amme et al 2005; Laukens et al. 2004). As mentioned in the beginning of this section, a catalogue of more than 2,500 tobacco metabolites is available (Nugroho and Verpoorte 2002). For metabolome analysis however, the intrinsic structural variety imposes most challenges at the analytical level, both for profiling multiple metabolites in parallel as for the quantitative analysis of selected metabolites (Oksman-Caldentey and Saito 2005). Performing non-targeted metabolome analysis methods for tobacco have been developed recently, either based on Nuclear Magnetic Resonance (Choi et al. 2004) or Fourier Transform Ion Cyclotron Mass Spectrometry (Aharoni et al. 2002; Mungur et al. 2005). These methods are suitable for metabolomic fingerprinting or rapid screening of differential tobacco samples and will therefore represent invaluable tools to study the tobacco metabolic network and dissect it at the molecular level.

Terpenoid indole alkaloids: Catharanthus roseus

The biosynthesis of terpenoid indole alkaloids (TIAs) proceeds via the central precursor strictosidine, which constitutes a fusion product of the shikimate pathway-derived tryptamine moiety and of the plastidic non-mevalonate pathway-derived secologanin moiety (Fig. 3). Altogether more than 120 TIAs have been isolated from Catharanthus during the past decades (van der Heijden et al. 2004). Besides the monomeric TIAs produced in several branches of the pathway, the pharmaceutically more interesting bisindole alkaloids are formed from the condensation of tabersonine-derived vindoline and catharanthine. Starting from the amino acid tryptophan and the monoterpenoid geraniol, the biosynthesis of bisindole alkaloids in C. roseus involves at least 35 intermediates and 30 enzymes (van der Heijden et al. 2004). Extensive intra- and intercellular translocation is involved in TIA biosynthesis (St-Pierre et al. 1999; Irmler et al. 2000; Burlat et al. 2004) and several transcription factors, in particular members of the plant-specific AP2-domain family, have been shown to coordinate the expression of TIA biosynthetic genes in response to external and internal signals (Memelink et al. 2001).

Fig. 3
figure 3

Structure and biosynthesis of terpenoid indole alkaloids in Catharanthus roseus. Metabolites are given as full names in lowercase and enzymes as abbreviations in capitals. Full and dashed arrows mark single and multiple conversion steps between intermediates, respectively. Enzymes listed: AS (anthranilate synthase), CMK (4-diphosphocytidyl-2C-methyl-d-erythrol kinase), CMS (4-diphosphocytidyl-2C-methyl-d-erythrol-4-phosphate synthase), CPR (cytochrome P450 reductase), DAT (deacetylvindoline 4-O-acetyltransferase), D4H (desacetoxyvindoline 4-hydroxylase), DXR (1-deoxy-d-xylulose-5-phosphate reductoisomerase), DXS (1-deoxy-d-xylulose-5-phosphate synthase), G10H (geraniol 10-hydroxylase), GPPS (geranyl diphosphate synthase), HDR (1-hydroxy-2-methyl-2-(E)-butenyl-4-diphosphate reductase), HDS (1-hydroxy-2-methyl-2-(E)-butenyl-4-diphosphate synthase), 10HGO (10-hydroxygeraniol oxidoreductase), IPPI (isopentenylpyrophosphate isomerase), MAT (acetyl-CoA:minovincine-O-acetyltransferase), MECS (2C-methyl-d-erythrol-2,4-cyclodiphosphate synthase), NMT (N-methyltransferase), OMT (O-methyltransferase), SGD (strictosidine β-d-glucosidase), SLS (secologanin synthase), STR (strictosidine synthase), TDC (tryptophan decarboxylase), T16H (tabersonine 16-hydroxylase)

Whereas the genes encoding TIA biosynthetic enzymes usually have been identified one-by-one by the classical approaches, the transcription factors that regulate TIA biosynthesis were isolated through T-DNA activation tagging and yeast one-hybrid (Y1H) screenings (Memelink et al. 2001). In an attempt to isolate master regulators of TIA biosynthetic genes the T-DNA tagging approach was applied to C. roseus suspension cells (van der Fits and Memelink, 2000; van der Fits et al. 2001). T-DNA tagged cells were screened for resistance to a toxic substrate of one of the TIA biosynthetic enzymes: tryptophan decarboxylase (see Fig. 3). This screening yielded several interesting leads, one of them ORCA3, an AP2-domain transcription factor and so far the most potent transcriptional regulator of TIA metabolism (van der Fits and Memelink 2000). In another screening for TIA regulatory genes, Memelink and coworkers employed the Y1H technique, a method to isolate novel genes encoding proteins that bind to a target cis-acting regulatory DNA element. In this case promoter elements of another TIA gene, strictosidine synthase (Str, see Fig. 3), were used as bait DNA sequences. This method proved extremely successful and moreover suggested a considerable degree of complexity in the control of TIA biosynthesis. In fact, multiple transcription factors of different classes, including AP2, MYB, bHLH and zinc finger-type proteins, turned out to show binding specificity to Str promoter elements (Menke et al. 1999; van der Fits et al. 2000; Chatel et al. 2003; Pauw et al. 2005).

The first comprehensive functional genomics profiling approach to cover TIA biosynthesis in C. roseus employed systematic proteome analysis of a cell suspension culture accumulating strictosidine, ajmalicine and vindolinine (Jacobs et al. 2005). Two-dimensional SDS-PAGE was used to select 88 protein spots with a differential accumulation pattern, corresponding to the accumulation pattern of the alkaloids in the investigated culture. These protein spots were subsequently further characterised by mass spectrometry and for 58 of them a functional annotation based on sequence homology searching could be provided. Only two spots within this set, representing STR isoforms, were known to be involved in TIA biosynthesis. Other unique sequences were found, which may relate to unidentified biosynthetic proteins. The main limitations that were observed with this approach in Catharanthus, as well as within the similar poppy proteomic study (Decker et al. 2000), were (i) the generally low abundance of secondary metabolism proteins, preventing their isolation, and (ii) the lack of sequence data of both genes and proteins from Catharanthus, hampering identification of protein spots.

One method that may either circumvent or address these limitations of proteomic studies of non-model medicinal plant systems such as C. roseus, is cDNA-AFLP-based transcript profiling (Breyne et al. 2003). Large-scale gene discovery programs in non-model plants are hampered enormously by the fact that standard transcript profiling methods, such as serial analysis of gene expression or microarray analysis, are not applicable because of the lack of large sequence repertoires. In contrast, the cDNA-AFLP technology can be used to identify genes in such plants and acquire quantitative expression profiles at the same time. By applying this technique on jasmonate-elicited TIA producing C. roseus cells grown under diverse auxin regimes, we determined the quantitative temporal accumulation patterns of 10,790 transcript tags (Rischer et al. 2006). In total, good-quality sequences for 417 differentially expressed transcript tags were obtained. When blasting these 417 tag sequences with the 236 C. roseus EMBL entries publicly available prior to this study, less than 10% gave a perfect match. Thus, the vast majority of the tags identified represented novel C. roseus sequence information. Here, we further compared this transcriptome sequence data set with the sequences of the C. roseus proteome data set (Jacobs et al. 2005). For nine of the protein spots a potential corresponding transcript tag could be identified (Table 1), including the two STR isoforms and the so-called PS protein, suggested to be associated with TIA biosynthesis (Leménager et al. 2005). Importantly, and in contrast to the proteomics approach, we could identify most of the so far known structural and regulatory genes involved in TIA biosynthesis and successfully monitor them within a single cDNA-AFLP experiment (Rischer et al. 2006). In parallel we performed non-targeted metabolic profiling of the same samples, leading to a final set of 178 known and novel metabolites, including nine known TIA molecules (Rischer et al. 2006). Integration and correlation analysis of the expression profiles of the 417 gene tags and the accumulation profiles of the 178 metabolite peaks allowed depicting novel gene-to-gene or gene-to-metabolite networks for TIA biosynthesis in C. roseus cells (Rischer et al. 2006). These networks revealed for instance that the different branches of TIA biosynthesis were subject to differing hormonal regulation. On the basis of these networks, a select number of novel genes and metabolites likely to be involved in TIA biosynthesis could be forwarded; candidates that may deserve priority in diverse types of future functional screenings such as the one successfully employed for the gene inventory obtained by cDNA-AFLP analysis of elicited tobacco cells (De Sutter et al. 2005). Further extension of the metabolite profiling data with quantification of metabolic fluxes in an organisational approach (Morgan and Shanks 2002) could help elucidate the function, pathway structures and pathway fluxes for novel and known TIA compounds.

Table 1 Overlap between TIA-targeted transcriptome and proteome analysis of C. roseus cells

Conclusions and perspectives

Recently, several biochemical pathway knowledge databases for Arabidopsis thaliana have been developed, for instance AraCyc, MAPMAN and BioPathAt (Mueller et al. 2003; Thimm et al. 2004; Lange and Ghassemian 2005). These databases are powerful tools for integrative analysis of biochemical pathways and facilitate enormously analysis of ‘omics’-scale experiments. Although we are still somewhat remote from establishing similar biochemical pathway knowledge databases for alkaloid-producing medicinal plants, the rapid pace at which new and ever more potent functional genomics technologies are being developed, will offer unprecedented opportunities to map biosynthesis of alkaloids in non-model medicinal plants. Indeed, encouraging results have been obtained already for a number of alkaloid-producing plant species, in particular by integrated transcriptome–metabolome profiling approaches. Therefore we anticipate that functional genomics, and the knowledge it brings along, will allow a better exploitation of plant biochemical capacities, leading to increased production of interesting alkaloids or the design of novel alkaloids with novel or superior biological activities.