Keywords

1 Introduction

Marine sponges have been recognized as a prolific source of structurally diverse compounds with highly potent cytotoxic activity [1]. A notable example is halichondrin B from the marine sponges Halichondria okadai [2] and Lissodendoryx sp. [3, 4], which inspired the chemical synthesis of eribulin mesylate [5, 6] as a new chemotherapy agent for the treatment of metastatic breast cancer [7]. The unique structures and prominent biological activities of sponge-derived compounds [8] have attracted considerable attention of scientists to investigate their biosynthesis. However, so far only a few sponge-derived metabolites have been evaluated in terms of the biosynthetic pathways. One of the main reasons is that feeding experiment with labeled precursors, which is commonly used in the biosynthetic studies of terrestrial plants and microorganisms, has been difficult to apply in marine sponges, although there were a few successful examples reported [9].

Moreover, “symbiont hypothesis” has long been proposed that many sponge-derived compounds are produced by as-yet-unidentified symbiotic microorganisms, as they shared structural similarities of chemical scaffolds with typical bacterial compounds [10, 11]. This arises important questions about the identity of the actual microbial producers and how sponge-derived bioactive molecules are biosynthesized inside microbial cells. The limitation to cultivate the majority of microbial symbionts [12, 13] suggests the difficulty in identifying and growing the true producers for compound production. A potential problem is that laboratory cultivation may destroy any cell-to-cell communication between a target symbiont and the host or among symbionts in the natural environments that might be important for growth. Symbiotic bacteria may coexist with their host or other bacteria in natural habitats to get essential nutrients and substrates that are not present in standard media [14, 15]. Although cultivation conditions could be found, metabolites of interest might not be released due to the lack of required environmental signals [16].

Cultivation-independent experimental support for the microbial origin of sponge-derived natural products was initially provided by Faulkner and coworkers who conducted microbial cell separation from the Palauan Theonella swinhoei followed with chemical analysis of the cell fractions using 1H- NMR spectroscopy and Transmission Electron Microscopy (TEM) observation. They found the highest concentration of the cytotoxic polyketide swinholide A in the mixed unicellular bacterial fraction, but not in sponge-associated cyanobacteria, while the antifungal peptide theopalauamide in the filamentous bacterial fraction [10, 17]. Further 16S-rRNA gene analysis showed the identity of such as-yet uncultivated filamentous bacteria as “Candidatus Entotheonella palauensis” [18]. However, chemical localization alone is not enough to provide convincing evidence for the bacterial origin due to the possibility that natural products transport across different cell types and whole tissues [11, 16].

The extreme complexity of sponge-associated microbiome containing numerous homologous genes from diverse nontarget pathways [19,20,21] suggests that genetic analysis for identifying a target pathway in a marine sponge is very challenging without an effective strategy. Using a metagenomics-based strategy, Piel and colleagues isolated onnamide biosynthetic gene cluster from the Japanese sponge Theonella swinhoei [22]. The typical bacterial architecture of the isolated cluster marked the first genetic proof for the microbial origin of sponge-derived natural products [22]. Further development of metagenomic approaches has led to the isolation of the biosynthetic gene clusters (BGCs) for other sponge-derived complex compounds, such as calyculin A [23] and misakinolide A [24], which were both attributed to as-yet uncultured symbionts Entotheonella as the producers [23, 24]. Recent development of metagenomics and single-cell analysis supported by new DNA sequencing technologies and bioinformatic tools have opened completely new opportunities for the rapid discovery of biosynthetic pathways from as-yet uncultivated bacteria [25,26,27]. This offers new exciting challenges to produce rare sponge-derived bioactive compounds in sustainable ways by heterologous production in easily culturable bacteria [16, 26]. Further understanding of the underlying biochemistry and enzymatic mechanisms would enable synthetic biology application for the rational diversification of natural products to produce structurally novel analogs with improved pharmacological profiles [28].

Among marine sponges, Theonellidae family (order Tetractinellida, previously known as order Lithistide) [29] has been well-studied for natural product chemistry. Particularly members of the two genera Theonella and Discodermia are known as an important source of unique biologically active polyketides and peptides [8]. In this review, we focus on the biosynthetic aspects of four types of Theonellidae sponge polyketides: (1) onnamides, (2) misakinolides from Theonella, (3) calyculin, and (4) discodermolide from Discodermia. The medium-sized structures of these polyketides with long carbon chains suggest their biosynthesis is catalyzed by the modular megaenzymes called type I polyketide synthases (PKSs-I). Polyketides, such as onnamides and calyculin, harbor amino acid residues or peptide segments, suggesting that hybrid PKS-I and non-ribosomal peptide synthetase (NRPS) assembly lines are responsible for their biosynthesis.

2 Type I Polyketide Synthase and Non-ribosomal Peptide Synthetase

Type I PKSs are multifunctional enzymes organized into modules. Each module contains catalytic domains responsible for one round of polyketide chain elongation. One module consists of a core domain set: an acyl carrier protein (ACP) domain that plays a role as an anchor for the building block and a ketosynthase (KS) domain that elongates the polyketide chain. The additional domain acyltransferase (AT), either integrated into each module or located externally outside the PKS assembly line, plays a role in selecting an appropriate acyl-CoA building block. During biosynthesis, various functional modifications occur on the β-position of the polyketide chain through the catalytic activity of optional tailoring domains, such as ketoreductase (KR), dehydratase (DH), and enoylreductase (ER) domains. KR domain is responsible for reducing the β-ketoacyl moiety of an intermediate resulting from the KS-mediated condensation into a β-hydroxyacyl group. Then DH dehydrates a β-hydroxyacyl group to an α,β-enoyl that can subsequently be reduced by ER to generate a saturated acyl [30, 31].

The linear correlation between the resulting intermediates, the PKS domain organization and domain function allows the prediction of a polyketide structure. This correlation is called colinearity rule, which is usually applied for the PKS-I subclass “cis-AT PKSs” characterized by the presence of an AT domain integrated into each module [32, 33]. For the other subclass “trans-AT PKSs,” no integrated AT domain is present in each module. Instead, an alone-standing AT domain usually located outside the assembly line acts in trans to load acyl building blocks into the ACP of each module [34]. This subclass contains unusual enzymatic features to generate diverse complex polyketides, such as modules split between two distinct proteins, some presumed inactive domains such as non-elongating KSs, multiple MTs (methyltransferases) and O-MTs (O-methyl transferases) that introduce methyl groups, and a PS (pyran synthase) domain generating a pyran ring, and a domain set that introduces a β-branching on the polyketide chain [19, 35].

Non-ribosomal peptide synthetase (NRPS) is composed of a series of modules. Each NRPS module contains minimally three domains: an adenylation (A) for amino acid activation and incorporation, a thiolation (T) for amino acid residue thioesterification, and a condensation (C) for peptide chain elongation via transpeptidation. During peptide biosynthesis, each module acts through the activity of such catalytic domains to incorporate and modify an amino acid unit into the growing peptide chain. Since chain elongation and post-assembly involve the existing 20 proteinogenic amino acids and a larger variety of nonproteinogenic amino and other acids as the building blocks, a remarkable structural and functional diversity of peptides is generated [36]. The biosynthetic mechanism of NRPS assembly line is similar to that of PKS-I assembly line; and therefore, PKS-catalyzed biosynthesis is capable of encompassing NRPS modules to offer the stage for highly modified flamework of the medium-sized compounds [37]. The PKS and NRPS hybrid module can generate nonproteinogenic amino acids, which features mixed polyketide-peptide structures of sponge origin. Some cyclic peptides (e.g., orbiculamide A) isolated from Theonella contain unique amino acids, theoalanine and theoleucine, that represent moieties formed by PKS-NRPS hybrid assembly lines [27, 38]. Many type-1 PKS-NRPS hybrid genes have recently been cloned from free-living bacteria such as actinomycetes, but only a very few of these types of biosynthetic gene clusters have been identified in sponge-microbe association.

3 Onnamide-Type Compounds

Onnamide A (1) was initially reported from Okinawan marine sponge Theonella sp. [39]. It exhibited potent antiviral activity against herpes simplex virus type-1, vesicular stomatitis virus, and coronavirus A-59 [39]. Structurally close analogs included dihydroonnamide A, onnamides B-E [40], and theopederins A-E [41] isolated from T. swinhoei chemotype yellow collected in Hachijo Island [42]. Theopederins A and B exhibited promising antitumor activity against P388 cells [43]. Onnamide and theopederin series structurally resemble the antitumor pederin (2) from the rove beetles Paederus and Paederidus [44], mycalamide A (3) from the New Zealand sponge Mycale hentscheli [45], and psymberin (4) from the sponge Psammocinia aff. bulbosa [46]. Psymberin was also known as irciniastatin A from the Indo-Pacific sponge Ircinia ramosa [47]. The occurrence of these structurally related compounds in different macroorganisms suggested the microbial origin of their biosynthesis [19, 42].

A large part of pederin biosynthetic gene cluster was successfully cloned by Piel from the beetle Paederus fuscipus [48]. Additional upstream region of the cluster was subsequently isolated [49]. The isolated pederin biosynthetic genes exhibited typical bacterial features, which were found to be located in a symbiotic bacterium closely related to Pseudomonas aeruginosa [50]. The predicted biosynthetic enzymes of pederin resemble hybrid trans-AT PKS-NRPS systems, consisting of three large proteins (PedI, PedF, and PedH) [48, 49] (Fig. 1a). The typical modular trans-AT PKS features of the assembly line are the lack of an integrated AT in each module. Instead, stand-alone ATs were identified separately from the gene clusters. Other unusual characteristics include a non-elongating KS (module 4), and MTs (module 1 and 6) that introduce methyl groups [22, 48]. The module architecture and domain organization of PedI and PedF perfectly matched the pederin structure. The additional PedH corresponding to a variable polyketide side chain and a terminal arginine residue matched to the extended congeners onnamide series [48]. It was proposed that the presence of oxygenase gene (pedG) inserted into the cluster was responsible for terminating further chain elongation through oxidative cleavage, leading to the release of pederin molecule instead of onnamide [48]. This pioneering work paved the way for identifying the biosynthetic pathways of onnamide-type compounds in more complex microbial communities of marine sponges known to contain onnamide and theopederin series [22, 51].

Fig. 1
figure 1

Comparison of the BGC architecture and assembly line organization among the biosynthetic pathways of four onnamide-type compounds. GNAT GCN5-related N-acetyltransferase domain, KS ketosynthase domain, KR ketoreductase domain, MT methyltransferase domain, CR crotonase domain, OXY oxygenase, CR crotonase, PS pyran synthase, A adenylation, C condensation, TE thioesterase. The small and color circle bullet (●) is ACP, acyl carrier protein domain or T, thiolation domain

By developing metagenomic strategies that involved construction of highly complex 3D-metagenomic libraries and subsequent rapid library screening [52, 53], Piel and colleagues isolated the main part of onnamide biosynthetic genes from T. swinhoei chemotype yellow [22, 54], thereby providing the first genetic evidence of the bacterial origin of marine sponge-derived natural products. Further metagenomic sequencing of the enriched fraction of filamentous bacteria associated with this sponge combined with single-cell analysis revealed that onnamide biosynthetic genes were located on a plasmid in “Candidatus Entotheonella factor” that belongs to a newly proposed Phylum “Tectomicrobia” [27]. This approach enabled the identification of the missing downstream part of onnamide genes along with other gene clusters coding for the biosynthesis of almost all peptide natural products isolated from the sponge, which were all attributed to such uncultivated symbiont [27]. Piel Lab developed a targeting strategy that correlates between substrate specificity of trans-AT KS domain sequences and certain structure moieties [54], enabling the isolation of psymberin biosynthetic gene cluster from the metagenome of Psammocinia aff. bulbosa [51]. The isolated psymberin biosynthetic genes showed typical bacterial features, but the taxonomic identity of a bacterial producer was as-yet unknown [51]. More recently, the biosynthetic pathway of mycalamide A was described in the microbiome of M. hentscheli, which was specifically attributed to an uncultivated gammaproteobacterial symbiont called “Candidatus Entomycale ignis” as the producer [55].

The BGC architecture and the encoded biosynthetic enzymes of pederin, onnamide A, mycalamide A, and psymberin share remarkable similarity (Fig. 1). The core structure bearing a tetrahydropyran ring is shared among four compounds, and its formation is hypothetically catalyzed by the PedF/OnnI/McyF proteins and the corresponding upstream part of PsyD protein. In vitro functional analysis provided a convincing evidence that the pyran ring formation is catalyzed by pyran synthase (PS), a unique domain resembling DH domain with a deletion in the HxxxGxxxxP active-site motif [56]. Interestingly, cyclization to form the pyran ring undergoes in a stereoselective manner at the ring closing carbon irrespective of the configuration at the carbon having hydroxy group [56] (Fig. 2).

Fig. 2
figure 2

The biosynthetic steps in the formation of the core structure bearing a tetrahydropyran ring (indicated by blue color) shared among four onnamide-type compounds

The PedI/OnnB/MyC proteins containing GCN5-related N-acyltransferase (GNAT) is predicted to incorporate acetate as the starter unit for chain initiation. In addition, these proteins contain two crotonase (CR) superfamily domains that might correspond to the formation of the exomethylene group as “Western region” [35, 49]. The hydroxymethylglutaryl-CoA synthase (HMGS)-like protein encoded on OnnA/pedP genes was proposed to be involved in exomethylene group formation [19, 49]. The corresponding PsyA in psymberin assembly line lacks four domains (KR-MT-ACP-KS) that correlate with a missing hydroxylated and methylated building block in the psymberin Western region (Fig. 3). It was proposed that a PKS module acquisition of PsyA generated the extended versions PedI/OnnB/MyC in pederin/onnamide/mycalamide assembly lines [51]. The downstream part of PsyD harboring the N-terminal five modules was predicted to contribute to the formation of a unique dihydroisocoumarin moiety as the “Eastern region.” Compared with the onnamide “Eastern” part, the core structure is extended with three shifted double bonds and an arginine moiety, which is predicted to be formed by the activity of catalytic domains in OnnJ protein. Exchange of downstream modules was proposed as the factor that might covert the isocoumarin-like moiety into pederin/onnamide/mycalamide termini [35, 51].

Fig. 3
figure 3

Comparison between onnamide and psymberin assembly lines in the formation of the “Western” exomethylene group and the “Eastern” isocoumarin-like moiety

4 Misakinolide-Type Compounds

Misakinolide A (5) is a symmetric 40-membered dimeric polyketide found in the marine sponge T. swinhoei chemotype white [57, 58]. It is structurally related to the 42-membered dilactone macrolide swinholide A (6) from T. swinhoei [59,60,61]. Swinholide A and its two glycosylated variants were also reported from members of free-living cyanobacteria [62]. Swinholide A differs from misakinolide A only by the presence of an additional single double bond in each monomer structure. Although both compounds possess antifungal and cytotoxic activities through their binding to the actin cytoskeleton [58, 59, 63], they exhibit different mechanisms of action: swinholide A severs actin filaments, while misakinolide A caps the growing barbed end of filaments [63, 64]. Another close structure is the asymmetric 41-membered dilactone marcrolide hurghadolide A isolated from Red Sea T. swinhoei [65], which exhibited 10 times more potent than swinholide A at disrupting microfilaments [65]. Interestingly, these three misakinolide-type compounds share common scaffolds with cytotoxic compounds isolated from other organisms, such as tolytoxin (7) and scytophycins from the terrestrial blue-green algae Scytonema [66, 67], lobophorolide from the seaweed Lobophora variegata [68], and luminaolide (8) from the crustose coralline algae Hydrolithon reinboldii [69]. The occurrence of these close structural analogs in other organisms suggests that misakinolide-type compounds are produced by the sponge-associated bacterial symbionts [24, 26].

Using metagenomics-based approach, a misakinolide biosynthetic gene cluster (~90 kb) was successfully isolated from the metagenome of the Japanese T. swinhoei containing misakinolide A [24]. The isolated misakinolide biosynthesis gene cluster exhibited typical bacterial characteristics such as tightly packed genes, free of introns and polyadenylation sites, and preceded by Shine-Dalgarno motifs, suggesting the bacterial origin of misakinolide-type compounds [24]. To identify a bacterial symbiont responsible for misakinolide biosynthesis, complex microbial cells associated with the sponge sample were initially separated and sorted into several fractions by differential centrifugation [17]. Subsequent pyrosequencing of 16S rDNA amplicons suggested the accumulation of the filamentous bacterium “Candidatus Entotheonella” [24]. Single-cell analysis by combining fluorescence-activated cell sorting (FACS), multiple displacement amplification (MDA), and diagnostic PCR techniques [70,71,72] was proven as an efficient approach to connect misakinolide biosynthetic genes to Entotheonella 16S rRNA gene in the amplified DNA of some single filaments [24].

Bioinformatic analysis of the misakinolide gene cluster showed that it encoded four large polypeptides (misC, misD, misE, and misF) along with a stand-alone AT enzyme (misG) (Fig. 4). As shown in Fig. 4a, the module architecture and domain organization showed typical trans-AT PKS features, such as the absence of integrated AT domains, modules split between two distinct proteins (modules 5, 10, and 14), non-elongating KSs (modules 3, 11, 14 and 19), multiple MTs and O-MTs that introduce methyl groups (modules 3, 5, 6, 7, 9, 11, and 17), and a PS domain generating a pyran ring (module 15). The intermediate specificity and stereochemical configuration of the assembly line predicted by KS and KR sequence analyses indicated that the identified pathway matched to the structure and stereochemistry of misakinolide A [24, 26]. Interestingly, MisF protein of the misakinolide assembly line contains a penultimate module (module 18) that match to the extended congeners swinholide A and hurghadolide A. Since misakinolide A instead of swinholide A and hurghadolide was detected in the sponge extract, it was proposed that module skipping may occur during the biosynthesis probably due to the nonfunctional module 18 of MisF protein [24], leading to the direct transfer of the miskinolide monomer from ACP17 to ACP18 without elongation. Subsequent TE-catalyzed ligation of two misakinolide monomers would result in macrocyclization that produces the dimeric polyketide misakinolide A [26].

Fig. 4
figure 4figure 4

Comparation of the module architecture and domain organization between the assembly lines of misakinolide A, swinholide A, and the two closely related structures tolytoxin and luminaolide. Biosynthetic gene clusters of swinholide (swi), misakinolide (mis), tolytoxin (tto), and luminaolide (lum) are shown in scale. Note: AT acyltransferase, DH dehydratase, ER enoyl reductase, KR ketoreductase, KS ketosynthase, KSo non-elongating KS, MT methyltransferase, OMT O-methyltransferase, PS pyran synthase, TE thioesterase, and (●) acyl carrier protein

More recently, swinholide biosynthetic gene cluster was reported from the terrestrial cyanobacterium Nostoc sp. UHCC 0450 [73]. The module architecture and domain organization of the biosynthetic enzymes encoded by this swinholide gene cluster was very similar to those encoded by misakinolide gene cluster [24]. The only small difference is the presence of a single ACP in the module 3 of MisC protein while two ACPs in the corresponding module of the SwiC protein [24, 73] (Fig. 4a, b). Interestingly, the biosynthetic enzymes corresponding to the formation of cyclic core skeleton in misakinolide A (5) and swinholide A (6) shared similar domain organization with those in other structurally related macrolides, such as tolytoxin (8) and luminaolide B (7), suggesting the common ancestor of their biosynthetic genes [24].

Based on the bioinformatic analyses, it was proposed that the cyclic core skeleton formation starts from an intermediate bearing α-methyl moiety that undergoes a series of chain elongation and modification catalyzed by MisD/SwiD to MisF/SwiF in the misakinolide-type compound pathways (Fig. 5). The pyran rings as a part of the cyclic core skeleton are biosynthetically formed by the action of the first two modules in MisF/SwiF/TtoF/LumE containing DH and PS domains [24]. This was supported by in vitro enzymatic assay demonstrating that this PS generated a dihydropyran ring [24, 56]. Therefore, it was proposed that the pyran ring formation starts with generating an α, β-E-olefinic moiety (C6–C7) through DH-catalyzed dehydration of the C7 β-OH group. Subsequent reduction of the C11 hydroxy group to the olefinic moiety catalyzed by PS domain would give an intermediate bearing the C7–C11 pyran ring [26] (Fig. 5).

Fig. 5
figure 5

Biosynthetic origin of the cyclic core skeleton of misakinolide-type compounds, which is shared with that of other structurally related macrolides tolytoxin and luminaolide B

Five domains in the middle of MisF/SwiF/TtoF are missing in the corresponding protein (LumE) of the luminaolide assembly line [24] (Fig. 6). These domains correspond to the double bonds and methyl moieties next to an ester carbonyl group of the misakinolide/swinholide cyclic core skeleton (Fig. 6a). The lack of these domains in LumE corresponds to the presence of two hydroxy groups near the ester carbonyl of the luminaolide pathway. One of the OH groups would subsequently undergo O-methylation to form the methoxy group that might occur after polyketide assembly (Fig. 6b). The presence of this OH group instead of double bond(s) in the cyclic core skeleton may explain the inability of luminaolide B to bind to the actin-binding site surface [74, 75].

Fig. 6
figure 6

Comparative biosynthetic origin of the core structure part and the “tail” portion between misakinolide A and luminaolide B

The domain organization of the misakinolide assembly line that corresponds to a part of the “tail” portion appeared different from that of tolytoxin and luminaolide B assembly lines. The ‘tail’ portion of each misaknolide monomer harbors a pyran ring. It was hypothesized that the formation of this pyran ring at the “tail” portion was catalyzed by DH domain in MisC/SwiC via a series of chain elongation and modification starting from acetate as the starter unit [24] (Fig. 6a). Although no PS was present in MisC/SwiC, the absence of glycine in the HxxxGxxxxP motif of the DH domains from MisC/SwiC may indicate the important role of this domain in pyran ring formation [73]. The “tail” portion of luminaolide monomer was proposed to biosynthetically be derived from N-methyl-N-vinylformamide group that involves three steps of chain extension, keto-reduction into a double bond, C-methylation, and O-methylation [24] (Fig. 6b). This suggests that the biosynthetic pathways of misaknolide, tolytoxin, and luminaolide emerged from a common ancestor, which were diversified by deletion or acquisition of some domains corresponding to the “tail” portion [24].

5 Calyculin A

The Japanese marine sponge Discodermia calyx was found to contain the highly cytotoxic compound, calyculin A (9) [76]. Calyculin A shows potent and specific inhibition against protein phosphatases 1 and 2A, as observed for microcystin and okadaic acid [77,78,79]. Structural analogs were reported from other marine sponges, such as calyculinamide A (10) in the New Zealand deep-water sponge Lamellomorpha strongylata [80], geometricin A (11) the Australian sponge Luffariella geometrica [81], swinhoeiamide A (12) in the the Papua New Guinean sponge T. swinhoei [82], and clavosine A (13) in the Palauan sponge Myriastra clavosa [83] (Fig. 7), suggesting the microbial origin of the biosynthesis of these calyculin-type compounds. Calyculin-type compounds are mainly composed of polyketide and peptide segments connected by an oxazole ring, which are characterized by various functional groups. The peptide part contains two γ-amino acids, while the polyketide part harbors a 5,6-spiroacetal, phosphate, tetraene, and a terminal nitrile, suggesting a hybrid PKS and NRPS assembly line responsible for calyculin biosynthesis [84]. The western terminus is a nitrile group whose biosynthetic origin is also of interest. The odd carbon chain number of this compound is apparently contrary to the two-carbon extension cycle in the polyketide assembly line, suggesting the requirement of a modification step to generate this 25-carbon polyketide portion. It was proposed that the tetraene moiety is photoisomerized to afford geometrical isomers, calyculin B, E, and F [84, 85]. The presence of two unusual β-branched methyl groups in this moiety suggests the involvement of trans-AT PKS system in calyculin biosynthesis [26].

Fig. 7
figure 7

Calyculin and its structurally related analogs. Two β-branched methyl groups are indicated by red color. Calyculin A differs from close analogs for the presence of additional moieties as marked with dashed circle lines

In attempts to isolate calyculin biosynthetic genes, Wakimoto and colleagues initially identified several KS sequences that belong to trans-AT PKS from the sponge metagenome, which subsequently became a basis for designing probes to screen a metagenomic library of 250,000 fosmid clones. Screening of the fosmid library yielded 12 clones harboring calyculin biosynthetic gene clusters ranging from over 150 kb. The single-cell analysis employing laser microdissection revealed that the gene cluster was originated from the sponge-associated symbiont “Candidatus Entotheonella sp.” [23]. The isolated gene cluster encodes multifunctional PKS-NRPS enzymes, designated as CalA to CalI (Fig. 8a). The upstream cluster region outside of the gene cluster encodes AT (CalY) that might introduce malonate for chain initiation and elongation. Three NRPS modules are present at the beginning of the assembly line. The first two NRPS modules are proposed to be responsible for two rounds of γ-amino acid formation, generating a trimethylseryl unit and glycine residue corresponding to the distal end of calyculin A. The hydroxylated trimethylseryl unit is biosynthetically originated from serine that undergoes activation onto an NRPS module by an A domain and subsequent methylation on both the hydroxyl and amino groups [23, 26]. Subsequently, the third NRPS module may play an important role in the oxazole ring formation through the recruitment of serine residue as the precursor [23, 84]. The oxazole ring formation is initiated with serine activation and incorporation catalyzed by A domain. Subsequent cyclodehydration generates an oxazoline ring that is further oxidized to form an oxazole ring through a similar mechanism recently reported for the C2-symmetrical macrodiolide conglobatin [86] (Fig. 8b).

Fig. 8
figure 8

The domain organization in the calyculin A assembly line (a) and biosynthetic origin of important moieties in calyculin A (b). Note: HMGS 3-hydroxy-3-methylglutaryl synthase (HMGS), ECH enoyl-CoA hydratase, HC heterocyclisation domain, Ox oxidation domain, A adenylation, C condensation, TE thioesterase

As expected from the odd carbon chain number, a flavin-dependent oxygenase (CalD) is encoded at the downstream of two PKS modules after oxazole ring formation step. The exact enzymatic function of CalD is still unknown, but carbon chain shortening could take place by CalD-catalyzed reaction according to the contiguous carbon chain number C25 of the polyketide portion. This enzyme shows homology to flavin-dependent oxygenases encoded on pedG and oocK incorporated in the pederin [48] and oocydin biosynthetic gene clusters [87], respectively, suggesting that they share a similar function. Since PedG is encoded only in the pederin pathway but not in the onnamide assembly line [22], it was suggested that PedG is responsible for the termination of carbon chain of pederin [48]. Piel and coworkers conducted functional analysis of these oxygeneases. It was difficult to obtain the functional recombinant PedG, but instead OocK was obtained as a soluble protein and characterized to be a Baeyer-Villiger monooxygenase that inserts oxygen into the growing polyketide backbone [88]. Since both oocydin and calyculin A has odd carbon chain number, CalD is proposed to have similar function as OocK, which might be responsible for catalyzing Baeyer-Villiger oxidation of α-keto thioester substrate, followed by one-carbon elongation accompanying decarboxylation (Fig. 8). However, the details of the mechanism must await experimental validation.

After the chain shortening process proposed above, the 5,6-spiroacetal ring is formed, raising a question whether this ring formation occurs through cyclization of epoxide/ketone intermediates as described for monensin A [89], hydroxy ketone as reported for reveromycin A [90], or nonenzymatic route. The involvement of enzymes for the formation of spiroacetal moieties in biologically active natural products is best exemplified by RevJ, a spiroacetal synthase responsible for the 6,6-spiroacetal ring formation in reveromycin biosynthesis [90]. However, there is no homolog to RevJ encoded in the calyculin biosynthetic gene cluster. Considering that the 5,6-spiroacetal ring of calyculin A is a thermodynamically stable stereoisomer, it could be constructed in nonenzymatic manner from a keto-diol substrate. The calyculin assembly line is followed by the formation of tetraene moiety containing two β-branching methyl groups at C3 and C7. The domain set required for β-branching methylation is encoded in the corresponding region, which consists of a hydroxymethylglutaryl synthase (HMGS)-like enzyme and two enoyl-CoA hydratases (ECHs) [91]. Surprisingly, several modules are encoded in the downstream region of the domains corresponding to the tetraene part. The function of these extra modules in the nitrile formation is still unknown.

Further functional studies are required not only to verify whether the isolated gene cluster is specific to calyculin biosynthetic pathway but also to understand the mechanism of how calyculin is made inside the producer cell. Even though the entire pathway reconstitution is the best approach, it is still highly challenging to transfer and activate a large gene cluster of more than 150 kb in microbial hosts with different genetic backgrounds. Therefore, the function of key modification enzymes, such as the phosphotransferase CalQ, encoded on the cluster upstream region was investigated. Recombinant CalQ was initially produced by heterologous expression in E. coli. Functional analysis in vitro with some calyculin derivatives as putative substrates revealed that the phosphorylation of calyculin A occurred, generating a new derivative phosphocalyculin A, whose cytotoxicity was significantly lower compared to calyculin A [23]. This enzymatic phosphorylation process seems to be essential for the self-resistance of the sponge, since the accumulated potent phosphatase inhibitor is potentially harmful to the host sponge itself. It is therefore reasonable that the less cytotoxic protoxin is released as the end-product of the calyculin biosynthetic pathway. Interestingly, a phosphatase that converts the protoxin to a much more toxic form was observed in the crude enzyme fraction prepared from the sponge. The phosphatase activity is turned on in response to the sponge tissue disruption [23, 84]. The detailed mechanism of this wound-activated bioconversion process is currently under investigation.

6 Discodermolide

Discodermolide (14) is a 24-member polyketide carbon skeleton initially isolated in 1990 from the deep-water Caribbean sponge Discodermia dissoluta [92]. The gross structure of discodermolide was elucidated by extensive NMR studies, and the relative stereochemistry was defined by single-crystal X-ray crystallography [93]. The first total synthesis along with the absolute stereochemistry of this polyhydroxylated polyketide was reported in 1993 by Schreiber and his coworkers [94, 95]. Although (+)-discodermalide was initially found to exhibit immunosuppressive properties both in vitro [96] and in vivo [96], further bioactivity studies revealed its potent antiproliferative/antimitotic features [97,98,99] that target microtubule system in a mechanism of action similar to that of the clinically important anticancer drug taxol [98, 99]. Discodermolide belongs to prominent members of microtubule-stabilizing natural product agents and mitotic spindle poisons in the same group as taxol (15), epothilones A (16) and B (17) [100], and eleutherobin (18) [101] (Fig. 9). Compared with taxol, (+)-discodermolide was more potent but less cytotoxic [102] with the ability to inhibit taxol-resistant cell lines. More recently, it was found that (+)-discodermolide combined with taxol at low concentrations exhibited a synergetic toxicity by 20-fold towards human carcinoma cell lines [103, 104], a feature that was not found with both epothilones and eleutherobin [105].

Fig. 9
figure 9

Prominent members of microtubule-stabilizing natural products

The supply problem for discodermolide is chronic due to the extremely low quantity of this compound in the wild sponges. Obtaining this promising compound for preclinical and clinical trials have been pursued through initially sponge harvesting from natural sources and subsequently chemical synthesis of up to 60 g [93, 105], but the cost for the drug production is very high. Cultivation of the producer microorganisms or heterologous expression of the biosynthetic genes is considered as one of the promising approaches to provide discodermolide sustainably for drug production. In attempts to find discodermolide biosynthetic genes, researchers at Kosan Bioscience conducted metagenomic analysis of microbiome in D. dissoluta collected from the coast of Curacao, the Netherlands Antilles [106]. The symbiotic bacterial cells were initially separated from sponge cells as previously described by Bewley et al. (1996) [17]. The resulting separation consisted of (1) fraction containing highly enriched filamentous bacteria, (2) fraction containing a small number of unicellular bacteria and sponge cells, and (3) supernatant containing a mixture of sponge cells and filamentous and unicellular bacteria [106]. Highly abundant PKS and NRPS genes were found in the enriched filamentous bacterial fraction containing Entotheonella. Although discodermolide biosynthetic gene clusters have not been identified until now, this comprehensive study revealed the presence of highly diverse symbionts in the sponge Discodermia [106], as has been described in Theonella [27]. TEM-observation and Fluorescent In Situ Hybridization (FISH) analysis using targeting probes clearly indicated a highly abundant filamentous cells Entotheonella [106, 107]. The symbiont “Entotheonella sp.” was also observed in D. dissoluta from the Caribbean Sea using FISH [107].

Discodermolide has 13 stereogenic centers, a tetrasubstituted δ-lactone, one di- and one trisubstituted (Z)-alkene, a carbamate moiety, and a terminal diene [105]. The structure of discodermolide suggests that its biosynthesis is catalyzed by a bacterial type I modular polyketide synthase (PKS) [31]. The biosynthetic origin of some structural moieties can be predicted, as described in Fig. 10, based on previous biosynthetic studies reported for polyketides. For example, carbamate/carbamoyl group is present in some natural products, in which their biosynthetic pathways were identified, exemplified by corallopyronin A [108] and kalimatacin/batunin [109]. It is proposed that carbamoyl moiety in discodermolide structure is derived from bicarbonate [108], and it is transferred onto the polyketide chain probably by the activity of carbamoyltransferase [109] (Fig. 10). A tetrasubstituted δ-lactone in discodermolide is likely resulted from polyketide chain termination/release catalyzed by TE domain [31] (Fig. 10).

Fig. 10
figure 10

Predictive model for the biosynthetic origin of discodermolide

7 Conclusion

Theonellidae is considered as a gifted sponge family for having highly diverse structures of bioactive secondary metabolites, including polyketides and peptides. Due to the structural similarities of compound scaffolds from this sponge family with typical bacterial compounds, it has long been proposed that sponge-derived natural products are actually produced by microorganisms that live symbiotically or obtained from seawater through filter-feeding. This bacterial origin theory has been well-discussed for this sponge family, particularly the genus Theonella. Experimental support to provide convincing proof for the theory has been pursued through various cultivation-independent approaches that involve (1) bacterial cell separation from the sponge hosts, (2) microscopy observation, (3) chemical localization, (4) metagenome mining, (5) single-cell analysis, and (6) metagenomic sequencing. Mechanical cell separation and subsequent microscopy observation show a high population of filamentous bacterial cells identified as “Candidatus Entotheonella” that belongs to a newly proposed Phylum “Tectomicrobia.” Development of metagenome mining was proven to be effective strategies to isolate biosynthetic gene clusters of interest from complex sponge-bacteria association. With recent next generation sequencing technologies, metagenomic sequencing of the filamentous cells has led to the rapid identification of many biosynthetic gene clusters corresponding to polyketides and peptides known from the sponge. Single-cell analysis has enabled to connect the isolated/identified gene clusters to Entotheonella. Of the two Entotheonella variants identified in the yellow chemotype of the sponge T. swinhoei, “Ca. Entotheonella factor” was identified as the producer of almost all secondary metabolites isolated from the sponge. The white chemotype of T. swinhoei harbors another variant called “Ca. Entotheonella serta,” which was shown as the producer of misakinolide and theonellamide. Calyculins were also shown to be produced by “Ca. Entotheonella sp.”. Interestingly, the presence of Entotheonella in the filamentous cell fraction associated with the genus Discodermia raises a question whether this as-yet uncultivated symbiont is responsible for the production of Discodermia-derived compounds such as discodermolide. Compared with other sponge families, Theonellidae sponges have more intensively been investigated, not only for diverse structures of polyketides and peptides, but also for the biosynthetic pathways and the actual producers. For other sponges known to produce bioactive secondary metabolites, little progress has been made in terms of identifying the biosynthetic pathways or the bacteria responsible for production. Since many important bioactive substances have been reported from non-Theonellidae sponges, it is expected that similar cultivation-independent studies will be carried out for other sponges as well.