Introduction

Actinobacteria are gram-positive bacteria having high G+C content, usually above 60 % and are famous for the production of antibiotic and other secondary metabolites. A large number of evidences suggest that actinobacteria constitute almost 75 % of all known products and each strain possesses a genetic potential to produce more than 20 secondary metabolites. Within actinobacteria, members of genus Streptomyces contribute to almost 70–80 % of secondary metabolites while a small percentage is contributed by Amycolatopsis, Actinoplanes, Micromonospora and Saccharopolyspora [1]. Among these genera, the genus Amycolatopsis is of special importance for its capacity to produce several commercially and medicinally important antibiotics as balhimycin, vancomycin and rifamycin [24], other secondary metabolites as immuno-suppressants, anti-cancer agents etc. [5] and other applications [68]. These secondary metabolites often have diverse, unusual and complex structures and they are not essential for the growth of these organisms. Interestingly the secondary metabolites are produced when the producer strain enters a dormant/reproductive stage [9]. Although it is not known, but the presence of a large number of secondary metabolite gene clusters has been proposed to give a selective advantage to these organisms for combating stress [10].

With the revolution in the field of sequencing technology, 56,168 bacterial genomes have been sequenced out of which 6997 belong to the class Actinobacteria. Amongst these, only 486 actinobacterial genomes have been completely sequenced and annotated till date with the majority of them representing organisms that are a source of commercially important drugs or are contagious and infectious to humans and animals (http://wwws.ncbi.nlm.nih.gov/genomes/lproks.cgi). While the genomes of Streptomyces spp. have been extensively studied, Amycolatopsis genomes appear to have been sequenced only recently [1113]. The information on the secondary metabolome based on genomic analysis of species belonging to the genera Streptomyces, Saccharopolyspora, Salinispora, Frankia and Rhodococcus have already been reviewed [14], the analysis of recently sequenced genomes of Amycolatopsis has not been carried out. Here we analyze the genomes of members of the genus Amycolatopsis with implications on the production of secondary metabolites.

General Features of the Genus Amycolatopsis

Many bacteria of the genus Amycolatopsis were initially classified as Streptomyces then shifted to Nocardia and finally a new genus Amycolatopsis was created that included those species in which mycolic acid was absent in their cell wall [15]. The numbers of species in this genus that are validly published have increased from 10 in 2000 to 68 in 2015 (http://www.bacterio.net/amycolatopsis.html). These organisms possess several biosynthetic gene clusters (BGCs) that can be of biological importance [16], however the analysis based on the genome has been restricted to BGCs of balhimycin (bal), chloroeremomycin (cep) vancomycin (vcm), rifamycin (rif) [3, 4]. The genomic studies have revealed that Amycolatopsis have comparatively large genomes ~5 Mb (A. halophila YIM93223-10) to 10.86 Mb (A. balhimycina FH 1894), circular chromosomes and contains over 20 secondary metabolic gene clusters. As these bacteria are mostly soil dwellers, they experience a diverse and changing habitat which may benefit from a larger repertoire of genes allowing the organism to acclimatize and adapt to the changing conditions [17]. The occurrence of indigenous plasmids in this genus is also not very common. Until now, only six plasmids have been found in the genus Amycolatopsis [18] (Table 1). Till date, 30 Amycolatopsis genome projects have been completed, out of which six genomes have been completely sequenced, annotated and used extensively for research purposes (Table 2).

Table 1 General features of plasmids isolated from different Amycolatopsis strains.
Table 2 General characteristic features of Amycolatopsis genomes

Amycolatopsis mediterranei

The original strain A. mediterranei ATCC 13685/DSM 43304/ME 83/973 was isolated from a soil sample at St. Raphael in France and was classified as Streptomyces mediterranei, later as Nocardia mediterranei and finally as Amycolatopsis mediterranei [12]. Its ability for the production of rifamycin was recognized in the same year [12]. The original strain synthesized a mixture of rifamycins (rifamycin complex), however the addition of sodium diethylbarbiturate resulted in the production of a single fermentation product rifamycin B [11]. Subsequently, two mutant strains A. mediterranei ATCC 21789 and S699 capable of producing sole rifamycin B without the addition of sodium diethylbarbiturate in the medium were isolated [11]. The origin of these strains is very ambiguous as they have moved from one industry to another in the past 50 years. As of today, there are twelve strains of A. mediterranei (ATCC 13685/DSM 43304, ATCC 21271, ATCC 21789, ATCC 31064, ATCC 31065, ATCC 31066, S699, U32, RB, DSM 46096/S955, DSM 40773, W2800 and HP-130) which have been isolated independently or generated through the mutagenic treatment and produce different rifamycins (rifamycin B, SV, P, Q, R, U, W). Recently, genealogy of some of these strains has been reconstructed with the help of the available literature (Fig. 1) [5]. In an attempt to further analyze these strains, genomes of some of these strains have been sequenced: U32 [13], S699 [11, 12], RB (unpublished), DSM 46096 [20], DSM 40773 [21] and HP-130 [5].

Fig. 1
figure 1

Data from Nigam et al. [4]; Peano et al. [5]

Evolution of different rifamycin producing strains from the original strain A. mediterranei ATCC 13685 either by mutagenesis or combinatorial biosynthesis. Name of strain (bold), antibiotic produced (in parenthesis), NTG N-methyl-N′-nitro-N-nitrosoguanidine, UV ultraviolet.

Amycolatopsis mediterranei U32

Amycolatopsis mediterranei U32 was the first strain whose genome was sequenced. It was obtained through mutagenesis and is an important industrial strain for the production of rifamycin SV. The complete genome was sequenced and annotated in 2010 [13] and formed the basis for the phylogeny/taxonomy relationship comparative study of different genera viz. Streptomyces and Amycolatopsis of the order Actinomycetales. The initial genomic study of U32 revealed that unlike linear chromosome of Streptomycetes, it harbors circular chromosome similar to Saccharopolyspora erythraea and Nocardia farcinica depicting their close relationship in taxonomy and phylogeny. Chromosome of strain U32 which comprised of 10,236,715 bp was one of the largest prokaryotic genome to be sequenced at that time. Two integrated plasmids highly similar to pMEA100, present in several species of Amycolatopsis, were also found integrated in the chromosome. The genome was divided into an ancestral core and a non-core region same as in Streptomycetes and a novel quasi-core region was recognized in the non-core region that had more essential genes as compared to the non-core region. Transposable element induced genomic rearrangement was assumed to be responsible for the transfer of this quasi-core from the core into the non-core forming an integration hotspot. The ancestral core contained most of the essential genes that extended unequally on both side of the replication origin (oriC) [13].

By the time when genome sequence was released, the biosynthetic gene cluster for rifamycin was already characterized [31]. Also, genomic analysis of A. mediterranei U32 chromosome predicted 25 other gene clusters for the biosynthesis of uncharacterized polyketides, nonribosomal peptide synthetases (NRPS), hybrid PKS and terpenoids. The majority of these gene clusters were found to reside outside the core since only four clusters (rif, nrps11, tps1 and lyc) were found in the core region. 21 gene clusters were scattered in the non-core while one (nrps10) was present in the quasi-core region. Along with rif cluster, four other type-I and two type-II PKS clusters were also found in the genome of U32. The genome of U32 was recently reannotated [32] on the basis of RNA-seq data and a new Valyl-tRNA synthetase encoding gene was identified that was missing in the previous annotation. Additionally a large number of noncoding RNAs (ncRNAs) which comprised of approximately 11.29 % of total transcripts were also identified in the genome [32].

In actinomycetes, nitrate stimulatory effect has been studied extensively using U32 as model organism [32, 33]. A complete gene cluster of nasACKBDEF was found in U32 first by in situ hybridization screening and later confirmed by the whole genome sequencing. These genes were found to be Co-transcribed as an operon which was activated with the addition of nitrate or nitrite while repressed by ammonium [33]. A molecular mechanism is proposed showing activation of genes responsible for the production of precursor and rifamycin SV biosynthesis at the transcriptional level by addition of nitrate in the medium [32]. Since the optimization of this strain has not been successful, most commercial fermentation produces rifamycin B which is then converted into rifamycin SV.

Amycolatopsis mediterranei S699

Amycolatopsis mediterranei S699 was derived from the original strain A. mediterranei ATCC 13685, isolated from a soil sample at St. Raphael, France in 1957 [11]. Unlike the original strain, S699 was capable of producing solely rifamycin B without the addition of sodium barbiturate [11]. A. mediterranei S699 has been studied thoroughly to explore the genetics of rifamycin biosynthesis [31, 34, 35], as it produced rifamycin B it gained significant importance and since then it has been widely used in laboratory research [4, 5]. Rifamycin is an ansamycin polyketide (antibiotic characterized by an aliphatic bridge linking two non-adjacent positions of an aromatic nucleus) assembled by chain extension of 2 acetate and 8 propionate units onto the aromatic starter unit, 3-amino-5-hydroxybenzoic acid (AHBA) resulting into the formation of intermediate proansamycin X. After this the tailoring enzymes lead to the formation of an early central intermediate rifamycin W which is then converted to rifamycin B by a major polyketide backbone rearrangement [35]. RifPKS is a hybrid NRPS/PKS as it is comprised of five open reading frames (ORF’s) rifA–rifE that code for the multi-modular enzymatic complex of type I polyketide synthase (PKS) containing loading module that resemble non-ribosomal peptide synthetase (NRPS) adenylation/thiolation domain [34, 36]. The rif gene cluster is followed by a large number of tailoring genes which are involved in post-translational modifications such as hydroxylation, methylation and acetylation and convert the proansamycin X to rifamycin S, rifamycin SV and finally to rifamycin B [37]. Additionally, this strain has undergone a classical strain improvement program and is being used for commercial production of rifamycin B [38]. In order to gain insights into the genetic content of this organism, the genome was completely sequenced independently by two groups [11, 12]. A hybrid approach of Sanger and Pyrosequencing was used for sequencing and the genome was assembled using Phrap assembler resulting in 386 contigs. Subsequently, these loopholes were filled through primer walking, transposon mutagenesis as well as complete sequencing of linker clones by utilizing Roche 454, followed by reference based assembly using MIRA3 assembler while mapping to the reference genome of A. mediterranei U32. The assembly was validated and then annotated to give a single circular chromosome of 10,236,779 bp. Besides the rifPKS, five other PKS, twelve NRPS and three hybrid NRPS/PKS clusters were also identified. The genome sequence of A. mediterranei S699 was found to be very similar to A. mediterranei U32 at the nucleotide level (>99 %) [12].

De novo assembly using a combinatorial sequencing strategy was used to resequence and assemble the complete genome of A. mediterranei S699 [11]. Roche 454 GS FLX platform was used to generate the reads which were then assembled into 67 contigs. Sanger based sequencing was employed to fill the gaps, amend low quality regions and to verify the variation between draft sequences and genome regions of other strains. The genome was found to vary from the previously sequenced genome of S699 by 218 single nucleotide polymorphisms (SNPs) and 51 indels. The 12 indels of more than 40 bp and all repeated sequences were found to be insertions when compared not only to S699 but also U32. Except the three insertions, all other nine insertions were found to be present in the genome of ATCC 13685, the original strain and ATCC 21789. Thus the major indel variations between the two S699 sequenced genomes can be attributed to differing assembly strategies [11].

Amycolatopsis mediterranei S699 produces rifamycin B which is not a very effective antibiotic in its natural form, but when converted into its semi-synthetic derivatives (rifamycin S, rifamycin SV, rifabutin, rifapentine, rifaximin, rifampicin) (Fig. 2) it has much more potent activity and widely used in clinics for the treatment of mycobacterial infections including tuberculosis, leprosy and others [39]. However, a combination of poor medical supervision, poor compliance and long period of use has resulted in rifampicin resistant strains of Mycobacterium tuberculosis. The condition has been aggravated by the emergence of multi-drug resistant (MDR), extensively-drug resistant (XDR) and totally-drug resistant (TDR) strains of M. tuberculosis [39]. This problem has geared the antimicrobial research towards discovery of novel antibiotics, which can be effective against drug resistant strains. This could be achieved either by screening large numbers of bacteria or by generating novel analogues through chemical synthesis or by using combinatorial approach. As in case of rifamycin B, no more chemical modifications were possible for generating new analogues due to the structural complexity of the molecule. Thus, some of the researchers switched towards the combinatorial biosynthetic approach for the generation of novel analogs [4].

Fig. 2
figure 2

Chemical structures of rifamycin B, its semisynthetic derivatives (Rifamycin S, Rifamycin SV, Rifampicin, Rifabutin, Rifapentine, Rifaximine)

Generation of Rifamycin B Analog: 24-Desmethylrifamycin B by Genetic Manipulation of Rifamycin Biosynthetic Gene Cluster

Amycolatopsis mediterranei is less amenable to genetic manipulations due to the unavailability of cloning vectors and standardized transformation protocol. This problem was overcome by the development of a series of cloning vectors, transformation protocol [4] and genetic manipulation which was shown to be possible in this strain [35, 40]. As the genome sequence of A. mediterranei S699 [12] and Streptomyces hygroscopicus [41] was available, Nigam et al. [4] employed combinatorial biosynthetic approach for the genetic manipulation of rifamycin biosynthetic gene cluster. The acyltransferase (AT) domain of the 6 module (AT6) of the rifamycin polyketide synthase (incorporates propionate) was swapped with the AT domain in the 2 module (AT2) of the rapamycin polyketide synthase (incorporates acetate) gene cluster in A. mediterranei S699. The three mutant strains (rifAT6::rapAT2) generated through two homologous recombination #3, #34 and #36 produced rifamycin derivative, 24-desmethylrifamycin B. The analog lacked a pendant methyl group at C-33 within the rifamycin skeletal structure. The chemical characterization and structure was confirmed using LC–MS, NMR and X-ray crystallographic studies. The novel analog was further converted to its semi-synthetic derivative 24-desmethylrifamycin S and 24-desmethylrifampicin (Fig. 3). The antibacterial activity of these derivatives was checked against Staphylococcus aureus, Mycobacterium smegmatis, Bacillus subtilis, Pseudomonas aeruginosa, Escherichia coli and showed much better anti-bacterial activity in comparison with rifamycin B. These findings eventually led to the testing of 24-desmethylrifamycin S and 24-desmethylrifampicin against rifampicin-resistant strains of M. tuberculosis: OSDD 321, OSDD 206 (S531L), and OSDD 55 (H526T) which had mutations in their rpoB gene and were found to be more effective than commercially used rifampicin [4]. The probable hypothesis lies in the fact that loss of one methyl group might have lead to conformational changes in the ansa (from Latin word ansa meaning “handle”) chain which resulted in more flexibility of the compound to bind mutated RNAPs [4]. This study might form a basis for further genetic manipulations and production of large numbers of rifamycin analogs for biological and pharmaceutical applications.

Fig. 3
figure 3

Chemical structures of rifamycin B; analog 24-desmethylrifamycin B and its semisynthetic derivatives 24-desmethylrifamycin S and 24-desmethylrifampicin. Positions marked with circle denote the absence of one methyl group from C-24

Conversion of A. mediterranei S699 into an Overproducer Using Comparative Genomics and Transcriptomics Approach

As discussed before rifamycin B as such has very less antibacterial activity, however, its semisynthetic derivatives are widely used in clinics for treatment of mycobacterial infections. The knowledge of genetic control of biosynthetic pathway has accumulated and has led to several improved strains from the wild-type strain [38]. Recently Peano et al. [5] investigated the mutational pattern in the genome of A. mediterranei HP-130, a rifamycin overproducer using comparative genomic and transcriptomics. Since the sequence of wild type strain was unavailable, genome of A. mediterranei HP-130, an overproducer of rifamycin B generated from the wild type ATCC 13685 through twelve successive mutations by UV light and N-methyl-N′-nitro-N-nitrosoguanidine (NTG) was compared with sequences of two other strains S699 and U32 derived separately from the wild type. Comparative analysis revealed 250 variations which affected 227 coding sequences and 337 variations which affected 62 intergenic regions. 109 CDS variations in HP-130, 20 in S699 and 112 in U32 were specific relative to the ancestral strain, suggesting HP-130 was closer to S699 as compared to U32. Genes which were involved in fatty acid and lipid metabolism were mostly mutated which may be involved in precursors flux (malonyl-CoA and methylmalonyl-CoA) in rifamycin biosynthesis. Two interesting mutations, one nonsense mutation was found in one of the two mutB paralogs, mutB2 that codes for large subunit of methylmalonyl-CoA mutase while other missense mutation was found in argS2 which encode for an arginyl-tRNA synthetase homolog. Methylmalonyl-CoA mutase is involved in isomeric conversion of methylmalonyl-CoA to succinyl-CoA. It is already known that during the biosynthesis of rifamycin, 2 molecules of malonyl-CoA and 8 molecules of methylmalonyl-CoA are consumed. The effect of mutB2 mutation was validated by measuring the levels of methylmalonyl-CoA and succinyl-CoA in S699 and HP-130 and correlating it to rifamycin titres in bioreactor experiment. They showed that intracellular level of methylmalonyl-CoA increased in HP-130 and S699 during the growth but finally doubled in HP-130, while S699 accumulated higher levels of succinyl-CoA at later growth stages. Moreover, less rifamycin B was produced by S699 in bioreactors as compared to HP-130 (0.8 vs. 1.4 g/l) and it was suggested that metabolic re-direction led to the overproduction of rifamycin B in HP-130. Disruption of mutB2 gene in S699 eventually led to increased titres of rifamycin B as a consequence of high production of methylmalonyl-CoA and less succinyl-CoA [5].

Mutation in argS2 led to reduced arginyl-tRNA synthetase activity in HP-130 extracts as compared to S699 which might have affected the guanosine 3′, 5′ bipyrophosphate (ppGpp) cellular levels. Arginyl-tRNA synthetase activity, ppGpp levels and rifamycin concentration was measured from the same culture and it was found that HP-130 produced twice the amount of ppGpp as produced by S699 during fermentation at early and middle phases but no remarkable difference in the late stage. In S699 disruption of argS2 resulted in low levels of arginyl tRNA synthetase activity, high levels of ppGpp as well as increased production of rifamycin B but not to the levels as seen in HP-130 [5].

They also found a missense mutation in ppK that encodes polyphosphate kinase in HP-130 which catalyzes the reversible polymerization of γ-phosphate of ATP into polyphosphate (polyP). PhoR/PhoP, a two-component system positively control the expression of ppk that is induced when inorganic phosphate becomes a limiting factor during growth and intracellular level of ATP drops and thus polyp may be used to regenerate ATP. ppK mutants lack this ATP regenerating system and have a lesser energetic charge than wild type which may trigger the antibiotic biosynthesis. Disruption of ppK in S699 led to premature sporulation, growth defect, acidification of the fermentation medium and slightly reduced titres of rifamycin B. Two missense mutations were also found in genes affecting rif cluster. One affects ORF9 which codes for a class III aminotransferase and is not involved in rifamycin biosynthesis. The other substitution (Leucine to phenylalanine) is at 114 position of rifN, kanosamine kinase, an essential enzyme in AHBA biosynthesis pathway. Insertional inactivation of rifN led to blocking of the rifamycin biosynthetic pathway and rifamycin production and diffusible pigments were totally abolished. Thus it was confirmed that rifN plays an essential role in rifamycin biochemical pathway [5].

They also identified the differential transcriptional profile in HP-130 as well as S699 within the different growth phases (early phase a, 24 h; middle phase b, 36–48 h and late phase c, 60 h). A total of 899 differentially expressed genes in HP-130 and 952 in S699 were modulated with a phase shift when compared to rifamycin production. Thus the information obtained through comparative genomic approach coupled with transcriptome, metabolome and insertional inactivation led to the improvement of strain S699 guided by genomic analysis and genetically manipulating the key molecular targets in the genome [5]. This approach has taken the advantage of low cost sequencing of genome and has been successfully applied. This resulted into the identification of new molecular targets to accelerate the strain improvement by genetic engineering.

Amycolatopsis mediterranei RB

The complete genome was sequenced in 2012 by W. Zhao from Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China. The genome is unpublished and no further studies have been done with this organism (http://www.ncbi.nlm.nih.gov/nuccore/NC_022116.1). However, Average Nucleotide Identity (ANI) revealed that A. mediterranei RB is 99.9 % similar to A. mediterranei S699 and A. mediterranei U32 [20].

Amycolatopsis japonica MG417-CF17

The complete genome sequence of A. japonica was released in 2014. This strain was discovered while screening large numbers of bacteria for specific inhibitors of phospholipase C [19]. Based on cultural and morphological studies, it was initially classified as A. orientalis. However, 16S rRNA sequence based analysis [42] and other phenotypic properties, led to its reclassification as A. japonicum which was validated and corrected to A. japonica. It has proved to be of great interest as it produces [S,S]-ethylenediaminedisuccinic acid (EDDS), a hexadentate chelating agent which is a biodegradable isomer of ethylenediaminetetraacetic acid (EDTA) [19]. Apart from this no bioactive compound could be isolated from A. japonicum under laboratory conditions. It produced precursors of cell wall that are resistant to glycopeptide [43] and oxyB gene was present in the genome, which is important in the production of glycopeptide [44] and cementing the assumption that A. japonicum possess the potential for glycopeptide production. In order to identify genes for glycopeptide biosynthesis, regulation, self-resistance mechanism and their ability to synthesize other secondary metabolites, the genome was sequenced. The genome of A. japonicum MG417-CF17 had two replicons: chromosome which was approximately 8.96 Mb in size and 92.5 kb plasmid pAmyja1 [19]. 29 other secondary metabolite gene clusters were found to be present in the genome among which one of the clusters encoded for the synthesis of polyketide synthase compound ECO-0501. As this cluster and corresponding product was already identified and reported in literature [45], the focus shifted to identification of some other biosynthetic gene cluster. A type III PKS/NRPS hybrid gene cluster that showed high similarity to other glycopeptide gene clusters like balhimycin and teicoplanin was identified. This cluster was studied extensively and 39 distinct ORFs with a total size of 69 kb were identified. All these ORFs were predicted to code for enzymes involved in biosynthesis, assembly and export of glycopeptide, gene regulation and self-resistance. A pathway-specific StrR-like regulator controlling all glycopeptide clusters was found to be under the control of pathway-specific AjrR regulator. Since this gene cluster was found to be cryptic and no glycopeptide could be produced under standard conditions, the regulator AjrR was predicted to be non-functional. Thereby, a cluster activation strategy was employed for the overexpression of a pathway-specific regulator of transcription of the balhimycin gene cluster bbr Aba in A. japonicum [46]. This led to bbr Aba being cloned in pRM4 vector, an integrative vector controlled by constitutive promoter ermEp which was then transformed into A. japonicum. The antibacterial activity of the recombinant species A. japonicum/pRM4-bbr Aba was analysed against indicator species, B. subtilis and was found to be effective. The product was found to be fully cross-bridged, sixfold glycosylated, twice methylated with no acyltransferase or halogenase in the gene cluster on in silico analysis which suggested that it is a type III glycopeptide. This was the second type III glycopeptide identified as until then, only the structure of ristomycin A, a type III glycopeptide produced by A. lurida was known [46]. The stereochemistry and amino acid sequence of the glycopeptide produced were similar to that of ristomycin A. The chemical structure of the glycopeptides were analyzed using HPLC, HPLC–ESI–MS, ESI–MS/MS after the fractionated crude extract showed the presence of a major compound with m/z = 1034.8 and a minor compound with m/z = 887.7. These masses were similar to that of ristomycin A and B. HR-MS confirmed the exact mass of the isolated glycopeptide as ristomycin A and thus with the same molecular formula C95H110N8O44 [46]. This resulted into the activation of the cryptic gene cluster for production of ristomycin A having antimicrobial activity.

Amycolatopsis methanolica 239

The rarest example of a facultative methylotroph is A. methanolica which harbors the RuMP pathway and thus capable of growing on methanol. It is a versatile facultative methylotrophic organism which was isolated from a soil sample from New Guinea. Initially, classified as a species of genus Nocardia, then Streptomyces and finally transferred to the genus Amycolatopsis. Its growth has been checked under diverse conditions and has also been studied extensively for aromatic amino acid regulations in biosynthesis and systematic deregulation of control systems [47]. The complete genome of A. methanolica 239 was sequenced by Tang et al. (unpublished) in 2012. Although the genome has not been studied extensively, genomic analysis revealed a less number of secondary metabolite biosynthetic gene clusters as compared to other Amycolatopsis species (Table 1). An amychelin biosynthesis gene cluster (Amys) was identified which is a NRPS dependent siderophore (iron-chelating agent) that was isolated from Amycolatopsis sp. AA4 for the first time. It was found to inhibit the growth of neighboured Streptomycetes coelicolor M145 [48]. In Amys cluster, a novel salicylate synthase (Ams) gene was identified which showed high Fe3+ chelation ability as it was involved in the synthesis of hydroxybenzoyl-oxazoline group from a salicylate [49]. Salicylate synthase is related to chorismate-utilizing enzyme family which have the ability to convert chorismate to salicylate initially and iron coordination moiety in few biosynthetic pathways for NRPS-derived siderophores. There are large number of sequences for putative salicylate synthase in database but only two of them, MbtI (Mycobacterium tuberculosis) and Irp9 (Yersinia enterocolitica) have been investigated [50, 51]. Comparative analysis of the Ams with other known salicylate synthases and comparison with 250 homologs based on blast hit has shown that Ams is a novel salicylate synthase gene responsible for the conversion of chorismate to salicylate. Thus, it was proved that Ams competed with other MSTs and shunted the flow of chorismate, a primary metabolite into amychelin and its analog biosynthesis reaction pathway which could support life of A. methanolica within its niche (Fig. 4) [49]. Thus organisms of this rare nature demand intensive study keeping in view their application in the field of secondary metabolites production which can be strengthened by the availability of genome sequence.

Fig. 4
figure 4

Proposed role of AmS in amychelin biosynthetic pathway

Amycolatopsis orientalis

Amycolatopsis orientalis is widely known for producing a large number of secondary metabolites including a potent glycopeptide antibiotic vancomycin [4] along with other related antibiotics as chloroeremomycin, eremomycin [52] and orienticins [53] (Fig. 5). Vancomycin is being used for decades as the last resort for treatment of serious methicillin-resistant Staphylococcus aureus (MRSA) infections [3]. Out of four strains DSM 40040/ATCC 9412 [27], DSM 43388 and DSM 46075 [28] and HCCB10007 [3] which have been sequenced, only HCCB10007 is complete and rest all are draft assemblies. Genomic analysis of these strains revealed that DSM 46075 and DSM 43388 do not have the vancomycin biosynthetic gene cluster, however vancomycin resistance genes vanHAX was present in DSM 46075 located next to a gene encoding VanS/VanR two component system. The average nucleotide identity of DSM 43388 and 46075 was too low (76.6 and 76.5 %) with the type strain DSM 40040 suggesting that they should be reclassified into other species [28].

Fig. 5
figure 5

Chemical structures of glycopeptide antibiotics: a vancomycin (A. orientalis HCCB10007), b eremomycin (A. orientalis), c orienticin A (A. orientalis PA-42867), d A82846B (A. orientalis A82846), e balhimycin (A. balhimycina DSM 5908). All these glycopeptide antibiotics possess an identical heptapeptide backbone. Both balhimycin and vancomycin contain two sugars whereas eremomycin, A82846B and orienticin contain three sugars. Eremomycin is similar to orienticin A but differs from it only in the position of a chlorine

Amycolatopsis orientalis HCCB10007

Amycolatopsis orientalis HCCB10007 is the industrial strain derived from ATCC 43491 through a series of physical and chemical mutagenesis. The draft genomes of three A. orientalis subsp. orientalis were released [27, 28], however, the genomes of these strains were neither annotated nor analysed. Therefore, the complete genome of A. orientalis was sequenced, annotated and comparisons at the inter- and intra-generic level with different phylogenetic relatives within actinomycetes was done that led to the characterization of species specific and genus common features of the genome [3]. The genome of A. orientalis HCCB10007 comprises of two replicons, a large circular chromosome (8,948,591 bp) and a small dissociated circular plasmid pXL100 (33,499 bp). With the availability of complete genome sequences of other Amycolatopsis species, an intra-generic analysis was done using more rigorous statistical methods. Similar to A. mediterranei, four rRNA operons and fifty tRNA genes were annotated in the genome of A. orientalis, but unlike A. mediterranei, the genome of A. orientalis was devoid of selenocysteine tRNA (tRNAsec). Hence the genes encoding for selenocysteine synthase (selA), elongation factor (selB), selenosynthase phosphate (selD) and formate dehydrogenase having a selenocysteine encoding UGA codon were also absent in the genome. The two pMEA100-like integrated plasmids were absent from genome of A. orientalis and instead a single dissociated plasmid was present that has not yet been reported to be present in other genomes of Amycolatopsis. Three different regions namely core, quasi-core and non-core region were recognized in the genome and were similar to the genomic configuration of A. mediterranei with the only exception of two quasi core regions found in the genome of A. orientalis. Genome comparison of both the species revealed well conserved order of orthologs however a large inversion usually known as “X pattern” was revealed. The line of this X pattern was interspersed in the non-core regions with break points encoding mainly secondary metabolite gene clusters which could be due to horizontal gene transfer events. Species specific gene clusters (rifamycin in A. mediterranei and vancomycin in A. orientalis) were recognized in the core region by rare break points. This region in A. orientalis consisted of 64 kb vcm cluster, transcriptional regulators including two gene pairs of transposase/integrase and many hypothetical proteins. These two gene pairs were found to be a replica of each other with transcription in opposite direction indicating an insertion that would have occurred in the ancestral genome thus resulting in the acquisition of the vcm cluster in A. orientalis. Contrary to this, the two flanking regions of rif cluster were highly preserved among Amycolatopsis species and was found to be inserted between a gene pair encoding a unique DNA-directed RNA polymerase β subunit and a conserved hypothetical protein indicating that A. mediterranei ancestors acquired the rif cluster very recently. This intra-generic comparison led to the identification of hot spots pertaining to genomic plasticity in this genus. Core, quasi core and non-core regions are relative terms signifying some of the features of the genomes of certain species or genera being compared. However, they help in inferring the process of evolution of a genus or microevolution of species [3].

Genomic analysis revealed the presence of twenty-seven biosynthetic gene clusters, including nine typeI PKS, one typeII PKS, ten NRPS, three hybrid PKS-NRPS, two terpenoid, one lycopene and one β-carotene cluster covering over 6.2 % of the whole genome (~552 kb). The biosynthetic gene clusters were compared against the NCBI database to determine their phylogenetic relationship and most of the genes showed sequence similarity with A. mediterranei denoting a common phylogenetic origin [3].

Reverse-transcription PCR was employed to check the transcription profile of some of the BGCs, but no novel secondary metabolite was identified except for the van cluster which was already cloned and sequenced in 2010 (http://www.ncbi.nlm.nih.gov/nuccore/HQ679900.1). The entire vcm cluster was then annotated. The cluster encoded 35 enzymes which included vancomycin resistant proteins (VanH, VanA, VanX), three NRPS and tailoring enzymes that are recruited post-assembly along with a series of biosynthetic proteins. Biosynthesis of vancomycin start with synthesis of seven amino acid precursors followed by precursors assembly into a heptapeptide backbone and then post-assembly modifications leading to the cyclization, halogenations, methylation and glycosylation of the backbone to form the final product. The function of genes involved in post-assembly modifications was characterized in vivo by generating in-frame monogenic mutants using homologous recombination. The results were found in agreement with naturally isolated vancomycin derivatives. Thus, it was shown that tailoring enzymes except glycotransferase (GftD) were not very specific and has broad substrate specificity in vivo. They also analysed the common characteristics of the genus Amycolatopsis through intra- and inter-generic comparison with the genome of other actinomycetes. This has led to development of a sequence based molecular chemotaxonomic characteristics (MCCs) representing phenotypes of phospholipids and manaquinones of the genus Amycolatopsis [3]. Thus, this study has extended the genetic knowledge of genus Amycolatopsis.

Future Prospects

In the near future, the availability of genome information will provide useful insights to infer the presence of molecular structures as well as numbers of secondary metabolite in potential producers. It will also help in ascertaining the mechanisms involved in the regulation of secondary metabolite biosynthesis and aid the search for novel secondary metabolites through genetic engineering. Numerous available database and in silico computational approaches will help in the discovery of cryptic gene clusters which are present in the organism, however are not known to produce any secondary metabolites. These orphan secondary metabolite gene clusters represent a huge untapped source of new chemical compounds that may provide new resources for drug discovery. Although, one of the major challenges in this field could be the development of methods to awaken these silent gene clusters and predict their chemical biology as well as terminal pathway. Also, comparative genomic studies can be carried out in order to understand the variations in polyketide synthase gene clusters and related genomic characteristics. Since there is an ambiguity in the phylogenetic status of some of the strains, re-evaluation of the phylogeny of the members of this genus on the basis of genome sequence will provide a better classification. The classical mutate-and-screen method for strain improvement has attained a saturation state, and therefore the genomic information will help in the identification of key molecular targets to achieve industrial strain improvement.