Keywords

Introduction

The fungal kingdom represents a vast and largely untapped resource for the discovery of new natural products and their biosynthetic pathways . It is estimated that the number of fungal species (~ 1.5 million species) exceeds that of land plants by a ratio of 10:1. Only a fraction of this diversity (~ 100,000 species) has been described. Ascomycota (filamentous fungi) and Basidiomycota (including mushroom forming fungi) make up the vast majority of this diversity [1, 2]. However, despite this remarkable species diversity, relatively few fungi have been studied and even fewer species investigated for their ability to make natural products. Such studies are hindered by the complex life cycles of fungi, unknown or difficult to reproduce conditions for growth and natural products production, and genetic intractability of the majority of fungi [3]. Despite these challenges, natural products made by filamentous fungi such as Aspergillus, Penicillium have been used clinically as antibiotics, antifungals, immunosuppressants, and cholesterol-lowering agents (Fig. 4.1) [46]. Compared to filamentous fungi, Basidiomycota are a largely uncharted territory for natural products discovery, with only a very small fraction of the nearly 30,000 described species examined for their ability to produce secondary metabolites [710]. Even fewer studies have focused on the biosynthetic genes responsible for natural product production in this fascinating class of organisms [1124].

Fig. 4.1
figure 1

Natural products from fungi have very complex structures and a wide range of biological activities. Many of these compounds have been used to make synthetic analogs with improved therapeutic applications. Some bioactive compounds isolated from fungi and representing different natural product classes include, a the immunosuppressant cyclosporine A (a nonribosomal peptide) [139], b the immunosuppressant sirolimus (rapamycin) (hybrid nonribosmal peptide and polyketide) [139], c the antimicrobial fumagillin [140] (meroterpenoid, hybrid terpenoid-polyketide), and d the antitumor compound illudin S (sesquiterpenoid). [141]

Genome sequencing initiatives such as the US Department of Energy’s Fungal Genomics Program [25] have resulted in a massive influx of fungal DNA sequence data over the course of the last 10 years. Driven by an interest in fungi as sources for lingocellulose-degrading enzymes [26] and affordable next-generation sequencing technologies [27], the number of Basidiomycota genomes alone has ballooned from less than 5 a few years ago to currently over 100 draft genomes listed at the Joint Genome Institute’s (JGI) MycoCosm genome portal (http://genome.jgi.doe.gov/programs/fungi/index.jsf), of which only a select few have been annotated in detail [2836]. About twice as many Ascomycota genome sequences are listed at this portal, and many more are deposited at the National Center for Biotechnology Information (NCBI). The large number of fungal genome data provides a tremendous opportunity for bioinformatics-guided approaches to assess and access the natural products potential of fungi independent from whether their biosynthetic pathways can be induced under laboratory conditions.

One of the hallmarks of fungal genetic organization, similar to bacteria, is the physical clustering of core secondary metabolic genes of natural products pathways in the genome. Genomic clustering is believed to facilitate efficient regulation of natural product biosynthesis through transcription factors and epigenetics [3744]. Physical linkage between biosynthetic genes in a given pathway greatly aids in the characterization, discovery, and biotechnological exploitation of fungal natural biosynthetic pathways . Inexpensive DNA synthesis and the development of sophisticated synthetic biology approaches for heterologous biosynthetic pathway assembly will eventually enable scientists to bypass current challenges such as finding conditions for growth and natural products biosynthesis, and tools for genetic manipulations.

In this chapter, we will provide an overview of fungal natural products biosynthesis with an emphasis on genomic- and bioinformatic-driven pathway discovery. Regulation of secondary metabolic pathways (including silent and cryptic pathways) will be discussed elsewhere in this book and will not be a focus of this chapter. We will begin by examining the different bioinformatics tools available for genome mining for biosynthetic gene clusters with an emphasis on open-source accessible algorithms and software. We will then discuss elucidation and characterization of gene clusters responsible for the biosynthesis of major fungal secondary metabolite classes, polyketides (PK)/nonribosomal peptides (NRP), and terpenes . Apart from representing major fungal natural product classes that have been characterized in some detail, differences in PK/NRP and terpenoid biosynthetic cluster abundance in Ascomycota and Basidiomycota genomes provide some insights into the natural products repertoire of these two fungal groups. Each section will provide key examples of the workflow required to identify and characterize the genes responsible for synthesizing the biologically active compounds introduced above.

The Bioinformatic Tools of Genome Mining for Natural Products in Fungi

The physical clustering of biosynthetic genes for a given natural product provides an elegant means of elucidating fungal biosynthetic pathways . Unlike in plants, whose genomes are more complex and lack clearly delineated biosynthetic gene clusters [37], in fungi , the identification of a gene encoding a key enzyme in a biosynthetic pathway may lead directly to most of the remaining genes in the pathway. This can be particularly important for the discovery and characterization of natural product scaffold-activating cytochrome P450 enzymes. Like in plants, this enzyme family has undergone extensive gene duplication in fungi, particularly in Basidiomycota [45]. This makes it difficult to determine specific P450 functions based on sequence homology alone, especially for novel, multifunctional fungal P450 gene families [46, 47]. The identification of biosynthetic gene clusters has led to the characterization of fungal P450s involved in statin [48], PK [49], and alkaloid biosynthesis [50, 51]. With the increase of genomic sequence data, initially for filamentous fungi and more recently for Basidiomycota, much effort has, therefore, been invested in the development of bioinformatics tools to mine fungal genomes for these clusters.

The identification of biosynthetic gene clusters has typically begun with an mRNA or genomic “anchor sequence” based on one or more known biosynthetic genes in a pathway. Such an “anchor gene”, typically (but not always, [52]) encodes the first key enzyme in the biosynthetic pathway—for example, a polyketide synthase (PKS), nonribosomal peptide synthase (NRPS) [18, 24], terpene synthase (TPS), or other enzyme—depending on the type of natural product scaffold formed [12, 5355]. Traditionally, clusters were identified through molecular genetic techniques. For example, the creation and sequencing of a cosmid library led to the identification of the first gene cluster responsible for the production of trichothecene, a sequiterpenoid mycotoxin, in Fusarium graminearum F15 [56]. In addition, when the anchor gene was cloned and sequenced, fungal biosynthetic gene clusters were identified through subsequent genome walking [54, 55, 57]. These molecular techniques, however, have only allowed for the identification of clusters up to a certain size (up to ~ 20 kb) and with the anchor gene fully sequenced. The more recent availability of fungal whole-genome sequence data has allowed for the identification of biosynthetic clusters of any size, provided the genome assembly data include sufficiently large scaffolds. Basic Local Alignment Search Tool (BLAST) searches of fungal genome sequences for classes of enzymes, such as PKS, NRPS, and TPS genes, typically yield several potential candidate genes for a target enzyme. Gene prediction algorithms, such as Augustus [58], can then be applied to predict open reading frames (ORFs) and encoding cDNAs of the putative anchor genes and of upstream and/or downstream located additional biosynthetic genes. Such an approach led to the discovery of multiple terpenoid biosynthetic gene clusters in Coprinus cinereus and Omphalotus olearius by our group [11, 12]. However, manual gene prediction and identification of cluster genes is tedious, and cluster boundaries can be hard to pin down.

As a result, several tools have been developed to more easily identify the putative fungal biosynthetic clusters (reviewed in [5961]). The most extensive software tool available for the identification of biosynthetic gene clusters is antibiotics and Secondary Metabolite Analysis SHell (antiSMASH) [61]. When given a bacterial or fungal genome sequence data, antiSMASH identifies putative gene clusters by comparing all predicted gene products to a hand-curated set of models corresponding to gene families common to the two dozen types of secondary metabolic pathways it recognizes. Putative clusters are then further analyzed through sequence homology to identify types of multidomain enzymes such as PKSs or NRPSs, enzyme specificities, and potentially even a core structure of the natural product [61]. Originally released in 2011 [60] and updated in 2013 [61], antiSMASH now has the capability to analyze raw contig assemblies of entire genomes, although it should be noted that the assemblies must be relatively clean to avoid redundant identifications. The output visualizes cluster predictions (Fig. 4.2a), which shows homologs of clusters in other microbial species (i.e., fungi) (Fig. 4.2b), and allows for easy sequence extraction (Fig. 4.2c).

Fig. 4.2
figure 2

Sample output using antiSMASH 2.0 [61] to analyze the genome of the basidiomycete O. olearius. The raw scaffolds were uploaded to the antiSMASH server and analyzed for secondary metabolic genes/clusters. a The initial output lists all putative clusters and attempts to classify them by type. b Upon clicking on a cluster the view is expanded, and individual genes are shown within their genomic context. Additionally, homologous clusters from other fungi are shown below. c Each putative protein sequence can be quickly accessed and used to perform BLAST searches. (This figure was adapted from the output of antiSMASH 2.0)

While antiSMASH is extremely powerful, we found that it has some notable limitations in its ability to correctly predict genes and their encoded proteins and identify full biosynthetic pathways . First, individual biosynthetic genes may be easily missed by antiSMASH. In O. olearius, our group identified 11 terpene synthases through manual BLAST searches, while antiSMASH was only able to identify four of those genes. Second, the rules for defining the boundaries of secondary metabolic clusters are not, as of yet, entirely clear. While the “core” cluster may be predicted with ease, individual biosynthetic genes some distance away (10–20 kilobases) that may still be involved in late-pathway modifications are frequently missed. In some cases, satellite clusters of biosynthetic genes are located at distinct loci in the genome. For example, while most trichothecene biosynthetic genes in Fusarium sporotrichioides and F. graminearum exist in the tri5 core gene cluster, two late pathway genes are clustered elsewhere [62]. Finally, accurate structural annotation of genome sequences is crucial for the identification of biosynthetic genes; unfortunately common gene prediction algorithms (such as Augustus [58]) trained on other eukaryotic genomes typically perform poorly for prediction in Basidiomycota, whose genomes are rich in small intron and exons. Furthermore, genes may be closely spaced and neighboring gene models may or may not be fused together and cryptic alternative splicing is not uncommon [6367].

Our experience has shown that the gene prediction models used by most algorithms often lead to incorrect structural gene annotation in Basidiomycota, frequently requiring manual re-annotation using protein sequence alignment-guided identification of the most likely correct splice isoforms together with tedious attempts at obtaining correctly spliced cDNAs encoding functional proteins. Efforts to uncover the first biosynthetic gene of a biosynthetic pathway such as a terpenoid pathway can be arduous [68]. The presence of introns, some of which can be unusually small, complicates polymerase chain reaction (PCR) amplification [67]. In our experience, many splice variants can be amplified from a given cDNA pool, though only one splice variant has ever been confirmed to produce an active enzyme after transcription. Even when a good splicing model is predicted, and the expected gene is amplified, the resultant protein may still be inactive, as was the case with Cop5, a sesquiterpene synthase our laboratory cloned and attempted to characterize from Coprinus cinereus [12]. We appear to be just beginning to understand the complex splicing, transcriptional regulation , and possibly posttranslational regulation that leads to active secondary metabolic genes.

In the future, deep RNA sequencing will be a key to improving computational prediction of fungal biosynthetic genes and gene clusters. Already, RNAseq data has been shown to improve the accuracy of gene prediction models [29, 65, 69, 70] and has shed light on differential splicing [71]. In addition, transcriptomic data has also been useful in the delineation of gene cluster boundaries [24]. Presently only some of the more recently sequenced Basidiomycota genomes have associated deep RNA sequence data [29, 70] useful for biosynthetic pathway identification. Advances in HT-RNA sequencing and continued cost reductions for sequencing now allow affordable rapid and deep profiling transcriptome analysis under a variety of conditions or for diverse genotypes to collect large data sets for one species. Such data can be used to create gene coexpression networks built on physical distance to a seed natural products biosynthetic gene (e.g., TPS, NRPS, PKS, P450s) (guilt by association) as a powerful tool for pathway discovery. The fact that NP pathway genes are generally co-regulated through levels of shared transcriptional control elements (e.g., transcription factors and upstream intergenic gene regions) [72] represents yet another approach for network analysis within and also across species. Significant advances have been made in understanding the regulatory control elements of NP pathways in filamentous fungi , including the velvet family of regulatory proteins that are conserved among Ascomycota and Basidiomycota [39, 73, 74]. Genome analysis of Ganoderma and Schizophyllum [29, 31, 75] suggests high conservation of regulatory networks among mushroom forming fungi, which can be exploited for network building. Yet, gene coexpression network analysis so far has been largely applied for the discovery of natural products genes in plant [76, 77]. Guilt by association-based analysis based on DNA expression arrays was only recently applied to natural product biosynthetic gene cluster analysis in A. nidulans [78].

Polyketide Synthases (PKS) and Nonribosomal Peptide Synthases (NRPS)

Polyketides (PK) and nonribosomal peptides (NRPs) are major, structurally diverse classes of natural products known to be produced by numerous filamentous fungi [7981] and bacteria [82]. From an ecological perspective, these polyketide- and peptide-based secondary metabolites afford the host organism a wide array of largely cytotoxic or general antibiotic compounds, which effectively restrict the growth and development of organisms that may compete for space and nutrients. Synthesis of these metabolites is achieved by a simple and highly conserved general mechanism involving the iterative elongation of either amino acid or carboxylic acid building blocks, for nonribosomal peptides and polyketides, respectively. In a similar manner as fatty acid synthases (FAS), the relevant enzymes that coordinate the production of these varied metabolites are multifunctional, mostly iterative enzymes, with a predictable set of core domains that repeatedly utilize the same active site to elongate peptide or polyketide chains. For PKSs, these include ketoacyl synthase (KS), acyltransferase (AT), acyl carrier protein (ACP), and thioesterase (TE) domains [83]. Utilizing this core domain set, the condensation of activated acetate units produces a polyketide scaffold, upon which a vast array of modifications can be imposed by one or more sparsely conserved ketoreductase (KR), dehydratase (DH), methyltransferase (MT), and enoyl reductase (ER) domains. The relative reduction status and product profile of a given PKS is linked to the presence or absence of these domains, with so-called “nonreducing PKSs” lacking these domains entirely and having a relatively limited product profile, while highly reducing PKSs contain all three of these domains and drastically alter the scaffold molecule into a wide array of alcohols, ketones, and other interesting chemical variants (reviewed in [79]).

Conservation of domains and reaction mechanisms suggest that PKSs and NRPSs share ancestral origins. It is, therefore, not surprising that NRPSs also contain a core set of domains similar to those in PKSs, including the peptidyl carrier protein (PCP), adenylation (A), condensation (C), and the thioesterase (TE) domains [8486], which are required for the production of a scaffold peptide. This basal molecule can then be modified to varying degrees by a number of ancillary domains, including an epimerization or dual/epimerization domain (E and D/E, respectively), a reductase domain (R), and others involved in oxidation, cyclization, and methylation [85]. Additionally, genomic resources have revealed that many fungal PKSs and NRPSs cluster with cytochrome P450s (for example [87]) and that modifications of PKS and NRPS-derived scaffolds molecules by P450s have been implicated in the production of relevant secondary metabolites including mycophenolic acid, a grisan scaffold, tenellin, and the antifungal pneumocandin [8891]. The propensity of fungi to cluster an array of modification enzymes such as P450s around core scaffold-producing enzymes such as PKSs and NRPS s is a widely conserved means to create vast libraries of products from simple PK, NRP, and terpenoid (discussed in next section) building blocks.

From an engineering perspective, a great deal of information has already been accumulated regarding the structure and function of PKS/NRPS enzymes, with numerous recent reports highlighting the potential for engineering PKSs and also NRPSs from Ascomycetes such as Fusarium and Aspergillus [92, 93] and Basidiomycetes such as Ustilago maydis and Suillus grevillei [15, 17]. Ma et al. [92] conducted a detailed characterization of the lovastatin nonaketide synthase LovB; a highly reducing PKS catalyzing the production of dihydromonacolin L. Extensive in vitro analyses, as well as production from Saccharomyces cerevisiae and substrate feeding experiments provided the authors with a detailed understanding of LovB structure and function. The production of lovastatin from dihydromonacolin L is known to require LovB and the enoyl reductase (ER) domain of its partner enzyme, LovC [92]. Interestingly, in vitro experiments with LovB, LovC, and all required cofactors failed to release dihydromonacolin L, indicating that the action of another domain might be required. The authors successfully released dihydromonacolin L after coexpression of heterologous thioesterase-containing enzymes from Gibberella zeae, supporting the aforementioned claim. Moreover, the same can be accomplished with the ER domain protein LovC. Complementation of dihydromonacolin L release can be achieved via the heterologous expression of MlcG; an analogous ER domain containing protein from the compactin biosynthetic cluster of Penicillium citrinum [92]. These reports support the claim that the function of these enzymatic partners is more promiscuous than once believed and that engineering of designer pathways by swapping analogous domains from related PKSs and NRPSs is a viable strategy for production of novel chemistry as explained later in this chapter.

Despite the vast potential for isolation and production of valuable compounds from these metabolic clusters, there are significant gaps in our current level of understanding of NRP and PK biosynthesis in fungal systems. Indeed, a brief survey of the SciFinder returns fewer than 100 PKS- or NRPS-derived compounds from Basidiomycota, although recent work highlighting PKS diversity in Basidiomycota suggests that the number of biosynthetic genes grossly exceeds the number of reported PKs from these organisms [94]. This discrepancy most likely reflects a lack in the characterization of compounds produced by these enzymes.

A number of very recent reports have utilized multifaceted approaches to mine fungal genomes and increase our understanding of PK and NRP diversity. For example, a study by Lackner et al. [94] aimed to identify new PKSs in Basidiomycota probing the aforementioned genome resource at JGI with the KS domain of AflC of Aspergillus parasiticus and a selected group of related sequences. Thirty-five Basidiomycota genomes were queried, yielding more than 100 putative PKS genes [94], thus supporting the claim that the myriad of domain architectures presented by fungal PKSs represent an “in silico gold mine” for the discovery of new enzymes and possibly enzymes with variant domains. A similar approach has also been used with the well-conserved PCP, A, C, and TE domains of an NRPS to infer a great deal regarding the phylogeny and functional diversity of NRPSs [85] (reviewed in [84, 86]). Combining the information accrued from these reports, with greatly expanded genomic resources and the knowledge that a great deal of variation exists within the domain structure of PKSs, NRPSs, and also hybrid PK-NRPSs [94], it seems plausible to apply more advanced computational strategies for the isolation of novel metabolites produced from common scaffolds. In this scenario, known domain elements isolated from PKSs and/or NRPSs of interest could be used as queries against fungal genomic databases to isolate novel variants from a wide range of diverse genera (Fig. 4.3). From an engineering perspective, these variant domains represent modules that could be swapped interchangeably, yielding chimeric enzymes with the potential for novel chemistries (Fig. 4.3). This strategy has been implemented, albeit on a small scale, to engineer novel chemistry from an engineered hybrid PKS combining the asperfuranone and sterigmatocystin biosynthetic pathways of A. nidulans [95].

Fig. 4.3
figure 3

Scheme for developing novel PKS- and NRPS-derived chemistries. Known domain elements of NRPSs (green boxes; PCP, A, C, and TE) as well as downstream, modification domains (purple boxes; E D/E, R, and ME) can be used to query vast fungal databases to isolate variants in domain of interest. Variants isolated in this way for the epimerization and methylation domains (E D/E and ME) are shown as examples in dashed boxes. Subsequent domain shuffling experiments would allow for the construction of novel, chimeric enzymes with the potential for producing novel metabolites. An NRPS is shown as an example, but the same scheme could be applied to PKSs and also hybrid PK-NRPSs

In addition to genome mining targeting PKSs and NRPSs themselves, another potential strategy for identifying biosynthetic gene clusters is to elicit and examine their transcription. Many biosynthetic clusters are transcriptionally inactive (a.k.a. cryptic gene clusters) under normal cultivation conditions, particularly in endophytic fungi reliant on small molecule signaling from another organism [96]. One way to combat this transcriptional repression is to alter the expression of transcriptional activators/repressors through global epigenetic regulators or with genetic knockouts. An excellent review of the function of LaeA and the velvet complex mentions the implication of these global regulators of secondary metabolism in more than half of the PKS and NRPS genes in Trichoderma reesei [41]. Both knockouts and overexpression of global regulators like LaeA and its homologs [38] (or other related machinery, such as histone acetyltransferases [97]) can allow for more detailed genome mining when examining RNA-sequencing datasets and searching for areas of the genome in which transcription is directly affected. A comprehensive discussion of fungal secondary metabolic pathway regulation is provided elsewhere in this book.

Terpene Synthases and Terpenoid Biosynthetic Clusters

Terpenoids are all derived from the five-carbon isoprene units isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP). Prenyldiphosphate synthases then catalyze the head to tail condensation of two, three or four of these five-carbon units to produce 10-, 15- or 20-carbon (C10, C15, C20) isoprenoid diphosphate molecules, which serve as the substrates for Class I terpene synthases that are dependent on the ionization of the allylic diphosphate to form a reactive carbocation and triggering a cascade of cyclization and rearrangement reactions in the enzymes active site [98]. Depending on their chain-length specificity, this class of terpene synthases utilize the 10-carbon substrate geranyl diphosphate (C10, GPP) to form monoterpenes, farnesyl diphosphate (C15, FPP) to generate sesquiterpenes, or geranylgeranyl diphosphate (C20, GGPP) to synthesize diterpenes [99]. Terpenoids with more than 20 carbons are typically formed by the head to head condensation of two FPP or GGPP molecules, yielding longer isoprene chains that are then modified into various C30 (sterols) and C40 (carotenoids) terpenoid structures. For example, sterols are formed by Class II terpene synthases that rely on a protonation-initiated cyclization mechanism that yields the scaffolds of bioactive triterpenoids isolated from many fungi [100, 101].

Identification of the first terpene synthases from filamentous fungi , such as the sesquiterpene synthases aristolochene [102, 103] and trichodiene synthase [53] required laborious efforts. Advances in sequencing and the increasing availability of sequences data has led to the discovery and characterization of a suite of novel fungal terpenoid biosynthetic enzymes in the past few years. For example, in the past 5 years more than two dozen new fungal sesquiterpene synthases have been cloned and characterized [104], [52], [12], [11, 105] [105107]. Genome mining efforts not only enable the discovery of new types of terpene synthases and enzymes with new cyclization activities as discussed later but also comparative analysis of the natural production potential encoded by the genomes of the two major fungal phyla.

While polyketides and nonribosomal peptides are the major class of secondary metabolites discovered in filamentous fungi , terpenoids appear to be a predominant class of secondary metabolites in Basidiomycota. In contrast to the number of PKS and NRPS genes mentioned previously, Basidiomycota genomes contain large collections of sesquiterpenoid biosynthetic genes. In 2012 we identified more than 500 putative sesquiterpene synthases (TPS, see later) in only 40 Basidiomycota genomes , and many of the putative TPS appear to be part of a biosynthetic gene cluster [11]. As the number of publicly available genome datasets has doubled, so has the number of putative TPS genes, totaling nearly 1000. As a comparison, only 179 putative TPS genes were identified in close to 2000 bacterial genomes [108], and we identified only ~ 250 TPS genes in 80 Ascomycota genomes examined.

Conspicuously absent in sequenced Basidiomycota genomes are genes that could encode diterpene synthases, although some mushroom-forming fungi (including the pleuromutilin antibiotic producing fungus Clitiopilus passeckerianus [109]) have been reported to produce diterpenoids [110]. Ascomycota on the other hand are known to be prolific producers of bioactive diterpenoids and several biosynthetic gene clusters have been characterized. These include the well-studied gibberellin pathways found in several fungi that use a bifunctional diterpene synthase that combines the domains of a class I and class II terpene synthases (which are separate enzymes in plant diterpene biosynthesis) to cyclize GGPP [111113]. Mining of Ascomycota genomes followed by gene deletion studies and stepwise heterologous coexpression of pathway genes in Aspergillus has led to the elucidation of additional gene clusters involved in the biosynthesis of the diterpene compounds fusicoccin, brassicicene, aphidicolin and phomopsene [54, 55, 114120], and a sesterterppenoid (C25) [121]. Intriguingly, these terpenoid scaffolds are built by a novel-type of chimeric terpene synthases that combines the domains of a class I terpene synthase and a prenyldiphosphate synthase that provides the C25 isoprene substrate for the cyclase [55].

Genome sequences of several Aspergillus strains have recently enabled the discovery and heterologous reconstitution of a series of biosynthetic gene clusters involved in the biosynthesis of medicinally important meroterpenoids (e.g., pyripyropene [122], terretonin [123125], fumagillin [126] (Fig. 4.1c), austinol, and dehydroaustinol [127]), which are polyketide-terpenoid hybrid compounds. Except for the fumagillin cluster, all of these biosynthetic pathways involve an iterative, nonreducing PKS and an aromatic prenyltransferase [125] that attaches a prenyl chain (typically C15) to the polyketide moiety. The attached prenyl chain is then epoxidated by a flavin-dependent monooxygase to allow for cyclization by a novel type of membrane-bound Class II terpene synthase [122] [123125] [127]. In fumagallin biosynthesis [126], however, a novel membrane-bound Class I TPS first generates the cyclized terpenoid scaffold that is then attached to a polyketide chain. Yet another novel membrane-bound type II terpene synthase has recently been proposed to catalyze the cyclization of the geranylgeranyl chain attached to the indole moiety in indole-diterpene biosynthesis [128]. Several indole-diterpene biosynthetic clusters have been identified in filamentous fungi [128132]. Common to all clusters are genes that encode a putative GGPP synthase, aromatic prenyltransferase and a flavin monoxygenase and cyclase proposed to catalyze epoxidation and cyclization, respectively, of the GGPP chain[128132].

The aforementioned studies illustrate that even with a genome sequence at hand, it may not be possible to assign function to biosynthetic gene clusters solely based on homology to known biosynthetic enzymes. The identification in filamentous fungi of different types of chimeric biosynthetic pathways and of novel enzyme folds that catalyze similar reactions as in the case of terpene cyclization indicates that we may have only just begun to scratch the surface of the fungal secondary metabolome.

While the abundance of putative TPS genes may appear daunting to characterize biochemically, we found that when focusing on sesquiterpene synthases in Basidiomycota a relationship between sequence and function could be uncovered. Specifically, we examined sequence conservation as it relates to the first committed bond-forming step in the cyclization of the terpene molecule. We discovered that, despite relatively poor automatic gene prediction in the publicly available databases, the TPS genes partitioned to five clades, which appear to segregate based on their cyclization mechanism (Fig. 4.4) [11]. These clades represent the four initial cyclizations of FPP known to be catalyzed by TPS. With this information it is now possible to sort through the large set of TPS sequences based on the initial cyclization believed to lead to the desired product. Our group recently carried out a study focused on validating the predictive framework by examining sesquiterpene biosynthesis in the crust fungus Stereum hirsutum. Not only did this fungus possess a large repertoire of terpene synthases, it also has been studied for its production of bioactive natural products. As many of the sesquiterpenoids reported are derived from a humulyl cation, we chose to target protoilludene synthase homologs, which we expected to go through the same cyclization mechanism. Using the framework described previously, it was possible to clone and characterize three novel protoilludene synthases [119]. While our work only focused on sesquiterpenes, similar studies can be carried out to examine other classes of terpenoids in order to provide the same genome mining roadmap.

Fig. 4.4
figure 4

Schematic representation of an unrooted neighbor-joining phylogenetic tree of sesquiterpene synthases identified in Basidiomycota genomes. We surveyed 42 Basidiomycota genomes and found 542 putative terpene synthase sequences, of which 392 were built into a phylogenetic tree to establish a link between sequence and function in this class of enzymes. By applying context to the tree through the inclusion of biochemical data, conservation of the initial cyclization reaction of farnesyl diphosphate (FPP) was identified. [11]

Terpenoid biosynthetic gene clusters may be extremely difficult to characterize due to the complex nature of the clusters themselves. Clusters can range in size from only two genes (e.g., a terpene synthase and a P450 monooxygenase) to greater than a dozen. Complicating the matter further is the propensity for separate biosynthetic clusters to work together to form the same types of products. For example, in searching for the enzymes responsible for illudin biosynthesis in the Jack O’Lantern fungus Omphalotus olearius, two protoilludene (the precursor to illudin compounds) synthases were identified (Omp6 and Omp7). Both were part of biosynthetic clusters, though one contained only three genes [11]. Varying kinetic values for the two terpene synthases indicate a possible mechanism for overcoming rate-limiting steps in the biosynthesis of illudins, though this has yet to be experimentally validated. A parallel example from the fungus F. sporotrichioides shows two clusters and an independent gene responsible for trichothecene biosynthesis. The larger cluster contains 12 genes, while the second smaller cluster and the independent gene are responsible for late-pathway reactions [62]. Additionally, very similar strains may contain orthologous biosynthetic genes, but some may be pseudogenes, inactivated by the accumulation of mutations [62, 113]. This sort of genomic segregation implies a need for tight transcriptional control on late-pathway genes, perhaps to minimize toxicity/reactivity of intermediates. This sort of variation in the genetic organization of biosynthetic clusters makes it particularly challenging to find all of the genes responsible for any given product.

After cluster gene identification, the next step is to characterize the function of individual genes. For a genetically tractable fungus, gene function can be determined in part by knockout and complementation studies. Additionally, if the clusters appear to be transcriptionally inactive, background mutations can be made to alter the level of transcription. In order to study the biosynthesis of botridal in B. cinerea, a knockout strain was engineered for the bcg1 gene responsible for downregulation of the pathway [52]. In another example studying tricothecene production in F. graminearum, the expression of the cluster proteins was too low, so a strain was engineered to contain an overexpression cassette with the FgTri6 gene responsible for regulating the biosynthetic cluster [56].

When no genetic tools are available for the cluster’s source organism, the biosynthetic cluster must be heterologously characterized. As expected, Escherichia coliis often the first prokaryotic chassis used for characterization of biosynthetic genes. While some enzymes, such as the terpene synthases discussed previously, express well and have high activity in E. coli, many of the other pathway enzymes prove difficult to characterize in this host [62]. This is, in part, due to the prevalence of P450 monooxgenases in many biosynthetic clusters, which are associated with the cell membrane and tend to express poorly in E. coli. Recent work, however, suggests different strategies to accommodate these enzymes, though the applicability across many different P450 homologs is unknown [133]. Several genes from the trichothecene biosynthetic cluster have been successfully characterized in E. coli, despite the difficulties described previously [62].

A more commonly used chassis for the expression and characterization of late-pathway biosynthetic enzymes is S. cerevisiae. Our laboratory characterized a terpene synthase and two associated P450 monooxgenases through standard plasmid-based expression [12]. While this is feasible for a small number of genes for characterization purposes, the assembly of an entire pathway for stable high-level production, as shown by Keasling’s group in their efforts to produce artemisinin, is very laborious and requires extensive strain engineering [134]. Another consideration when producing secondary metabolites is their inherent toxicity and the absence of the machinery required to protect the host organism. For this reason and others stated previously, the development of genetically tractable and easy-to-manipulate fungal strains is the next step in studying biosynthetic pathways. Additionally, many fungal systems have developed transporters designed specifically to export or compartmentalize the toxic intermediates and products in order to maintain high levels of biosynthesis. Perhaps the most well-understood system is the “aflatoxisomes” in Aspergillus , which are responsible for the compartmentalization of aflatoxin in vesicles before export from the cell [135, 136]. Another analogous system in Fusarium graminearium involves the use of toxisomes to compartmentalize toxic compounds in tricothecene biosynthesis . Using colocalization experiments with GFP/RFP-tagged pathway enzymes, two P450s (P450-Tri1, P450-Tri4) and the HMG-CoA reductase (the rate-controlling enzyme of the mevalonate precursor pathway) were found to localize to toxisomes that interact with smaller vesicles. The smaller vesicles contain the MFS transporter Tri12 and are believed to accumulate toxic pathway products that are compartmentalized and then eliminated by fusion with the vacuole and plasma membrane. Interestingly, many biosynthetic clusters contain transporters believed to be similar in function to Tri12 (Fig. 4.5) [137, 138].

Fig. 4.5
figure 5

Secondary metabolic clusters from three different fungi. Shown here are sesquiterpenoid biosynthetic cluster from a Omphalotus olearius [11]. b Stereum hirsutum [107], and c Fusarium graminearium [56]. Predicted ORFs are colored according to their putative function, with gray arrows with dotted outlines representing P450 enzymes, white arrows with solid outlines representing enzymes with predicted roles in sesquiterpene scaffold modification, light gray arrows with dotted outlines representing a transporter, and white arrows with dotted outlines representing the respective sesquiterpene synthases in each cluster. Black ORFs indicate hypothetical/unknown proteins and known transcriptional regulators are gray arrows with solid outlines in the trichothecene (Tri) biosynthetic cluster [142]

With the massive amount of sequence data available we are now limited by the speed by which we can biochemically characterize terpenoid biosynthetic genes for their function. The next step, after sufficient biochemical data has been collected, is to use this secondary metabolic enzyme toolbox to generate products not found in nature. Synthetic biology and metabolic engineering provide us with the tools to do something at a much faster rate than evolution. By combining interesting enzymes across many different fungi we will likely be able to generate terpenoids never seen before in nature (Fig. 4.6). For example, we may find multiple clusters in which the first committed step is for a specific sesquiterpene. While that step shows very little variance, the next step—the modification of that terpene scaffold—presents an opportunity for metabolic engineering of nonnatural pathways. These new pathways may contain P450 enzymes from a number of different fungi know to produce the precursor compound of interest, and known to modify that precursor to for a final product with interesting biological activity. These new compounds may have slightly different biological activities and serve as new starting compounds for useful pharmaceutical compounds.

Fig. 4.6
figure 6

Strategy for the biosynthesis of nonnatural products through combinatorial approaches. In the example shown here, a sesquiterpene synthase converts its substrate, farnesyl diphosphate to a sesquiterpene hydrocarbon scaffold. Such a hydrocarbon scaffold is typically activated through oxygenation catalyzed by P450 monooxygenases. In this example, P450s from three different fungal sources and sesquiterpene biosynthesis pathways are used to differentially oxygenate the product to different final sesquiterpenoid products. Further extension of these pathways with additional combinations of modifying enzymes could lead to a range of novel products

Conclusion

Researchers have only begun to scratch the surface of the myriad of natural products and biosynthetic pathways that can be discovered through mining the genomes of higher fungi. The influx of fungal genome sequencing data in recent years, along with the development of bioinformatics tools, from BLAST to Augustus to antiSMASH, have allowed us to probe rapidly into the biosynthetic gene clusters abundant in this class of organisms. However, additional bioinformatic and genomic approaches need to be developed or adapted for improved gene prediction and functional annotation, and especially for the identification of biosynthetic clusters made up by novel types of enzymes, delineating cluster boundaries, and for finding satellite and chimeric clusters.

So far, sequencing data only exists for a tiny percentage of identified fungal species. As the amount of fungal genome sequencing data increases, fungi are bound to become a much greater source of new natural products and their biosynthetic enzymes. In addition to the development of genomic data and bioinformatic resources, a critical component of natural product discovery is the detailed characterization of the product and pathway enzymes. At this point in time, the biochemical characterization of biosynthetic gene clusters and heterologous refactoring of pathways is equally if not more challenging than their identification. To fully exploit the natural products potential of fungi, significant efforts are required that aim at developing synthetic biology approaches for high-throughput heterologous fungal pathway assembly; ideally facilitating direct translation of sequence information encoded in fungal genomes into biosynthetic output by a heterologous expression and production platform. The final, and arguably most interesting step, will be combinatorial biosynthetic approaches through the creation of novel biosynthetic assemblies for the production of an even greater diversity of potentially bioactive compounds.