Introduction

In prokaryotic cells, transcription factors (TF) are proteins that play important role in controlling the rate of transcription of a gene. TF’s bind to specific DNA sequences enabling RNA-polymerases to perform the transcription process. There are a wide range of transcription factors. Some harbour DNA binding regions that bind directly to promotors. Others bind to the enhancer region of a gene. Proteins binding directly to the promoter region initiates the transcription process. TFs binding to the enhancer regions are indirectly stimulating or repressing the transcription process [1]. It is known that regulation of transcription factor is the most important form to control a gene expression. TF have been identified in bacterial family actinobacteria like genus Streptomyces have been reported to control gene expression and their regulation within the organized network [1, 2].

Amycolatopsis mediterranei S699 is a well-known actinobacterium that produces rifamycin B. It has been developed as industrial strains for the production of rifamycin B primarily by using classical strain improvement methods [3]. Semisynthetic derivatives of rifamycin B are in use against Mycobacterium tuberculosis that causes tuberculosis (TB) [4,5,6,7,8]. With the development of Multi-Drug Resistant (MDR) strains of Mycobacterium tuberculosis, there has been an urgent need to develop new drugs against MDR-TB [9,10,11,12]. Characterization of the rifamycin biosynthetic gene cluster and a better understanding of the genome of Amycolatopsis mediterranei [13, 14] opened up the possibilities to manipulate rifamycin cluster to produce analogs of rifamycin B. An analog of rifamycin B; 24-desmethylrifamycin B was produced by manipulation of the rifPKS gene cluster and semisynthetic derivatives of this analog were found to be effective against rifampicin resistant strains of Mycobacterium tuberculosis [15, 16]. In order to commercially exploit the mutant strain that produces 24-desmethylrifamycin B, understanding of the regulatory mechanisms of rifamycin B and 24-desmethylrifamycin B production in the wild type and in the mutant strain respectively, is very important. Many studies on rifamycin B production have been performed in classical strain improvement programs and in programs used to understand the biosynthetic network leading to rifamycin B production. In addition, there are scanty reports on the regulation of antibiotic production in species of Streptomyces that produce a wide array of antibiotics [17,18,19]. However, little has been done to understand the role of transcription factors (TF) of Amycolatopsis mediterranei S699. As soon as the genome sequence of Amycolatopsis mediterranei S699 was available, we analysed its genome to explore different transcriptional regulators playing role in controlling antibiotic production. Cross-regulation among the cluster and regulatory proteins results in a highly complex regulatory network. In addition, this study also revealed that one gene involved in 3-amino 5-hydroxybenzoic acid (ABHA) production, rifN [20], acts as the major regulatory hub to control the regulation of the rifamycin biosynthetic gene cluster (RifBGC). This study now reports the crucial transcription factors and important genes that can be exploited to upscale the regulation of rifamycin production.

Methods

Complete Genome Sequencing of Amycolatopsis mediterranei S699

Whole genome complete sequencing was performed using Roche 454 system (GS20 version) and Sanger shotgun sequencing by Verma and group (2011) [21]. We reassembled, validated, annotated the genome again (accession number CP002896). The sequence was reanalysed for its genomic features.

Rifamycin BGC Gene Extraction

In order to identify transcription factors involved in rifamycin production it is important to extract genes linked with the antibiotic production. The rifamycin biosynthetic gene cluster harbours 42 genes arranged in four regions. The list of genes and their respective functions are given in Supplementary Table 1 [13, 14]. The 42 gene were identified, and protein sequence and identifier were extracted from UniPROT database [22]. The 42 rifamycin biosynthesis cluster genes were identified from the complete genome of S699 and their interaction map was created using STRING database v10.0 [23].

Table 1 List of transcription factor found to be involved in rifamycin B production

Transcription Factor Profiling

The genome harbours genes that code for transcription factors that are involved in regulation of gene expression by controlling the process of conversion of DNA to RNA during the process of transcription. Total TFs were annotated from the whole genome of A. mediterranei S699 using BLASTn program against P2TF database V2.9 [24] at default parameters. P2TF is an integrated and comprehensive database that hold the compilation of TF genes along with the annotation, sequence features, functional domain. The ID obtained for S699 was converted to ID for strain U32 due to the availability of interaction data and 99.9% similarity of the genome as annotated and shown in Fig. 1. These proteins were then subjected to STRING database to predict the interaction map of TF with that of rifamycin biosynthetic gene cluster to understand the mechanism and role of TF in regulating the production of rifamycin B by the cell. The interactome was then visualized using Cytoscape v3.7.1 [25].

Fig. 1
figure 1

Circular genome map representation of Amycolatopsis mediterranei S699 (red) with genome size of 10.23 Mb generated using CGView. The following features are shown (moving from the innermost track inward, origin of replication at 0 Mbp) (1) The blue track represents the genome of U32, its phylogenetically similar neighbour with ANI similarity of 99.9%. (2) ORF on the forward strand (3) Rifamycin biosynthetic gene cluster have been mapped on to the genome as orange arrows (4–6) CDS (purple) has been marked along with the genomic features like tRNA, rRNA, ncRNA, Regulatory genes on forward strands (7) backbone ring of the genome S699 (red) (8–10) CDS (purple) has been marked along with the genomic features like tRNA, rRNA, ncRNA, Regulatory genes on reverse strands (11) ORF on the reverse strand (Genome sequence by Verma et al. [21])

Network Analysis

The network was analyzed for the first neighbor of each gene from rifamycin BGC. The interactome map was created for the selected proteins along with the 42 rifamycin BGC genes. The network was subjected to GeneMania force directed layout for better understanding of the interactions. The network was analyzed using NetworkAnalyzer, the plug-in of Cytoscape v 3.7.1 and Perl programming version 5.18.2.2. Network was visualized based on node degree values to identify key proteins in the cluster [26].

Functional Profiling

The functional profiling of TFs was analyzed using STRINGApp plugin that allow to retrieve functional enrichment for Gene Ontology terms. This can be achieved by converting network to STRING network using STRINGfy and then the network is subjected to STRINGEnrichment.

Results and Discussion

Amycolatopsis mediterranei S699 Genome

The genome of S699 was reanalysed and circular map was generated using CGView Server [27]. The complete genome size of S699 was found to be 10.23 Mb, with 9575 predicted coding sequences as reported earlier [21]. A 90 kb DNA fragment containing the rifamycin biosynthetic gene cluster harbouring 42 genes was also mapped (Fig. 1). The coding density as reported earlier to be 90%, with an average CDS length of 954 bp was confirmed. A total of 52 tRNA and 4 rRNA operons were present. The genome was found to harbour ~ 10% of the genes that coded for transcription factors involved in regulation of gene expression. High number of TFs are embedded in the genomes of bacterial community that are involved in secondary metabolite synthesis, residing in tough environments etc. [1].

Identification of Transcription Factor

In order to explore the role of transcription factor involved in controlling rifamycin biosynthetic gene, 1102 TFs were extracted from the reference genome of Amycolatopsis mediterranei S699 using P2TF database. TFs along with the 42 genes were then mapped onto STRING database to analyse the interaction among these genes/ proteins they encode. Cytoscape v3.7.1 [25] was used to visualize the final interactome (Fig. 2). Out of 1102, 921 TF’s were found to be interacting with each other and RifBGC genes. We then extracted TFs that were directly interacting with RifBGC using the first neighbouring interacting proteins. A total of 30 TFs were found to be directly involved in the regulation of RifBGC (Table 1).

Fig. 2
figure 2

The whole interaction map of total transcription factors (green) and the rifamycin biosynthetic gene cluster (red) [Node:963 and Edges:3202]. The spherical nodes represent the proteins and grey lines represents their interaction with each other. The transcription factors interacting directly with rifamycin cluster is marker in blue

Diversity of TFs Regulators

Out of 1102 TFs identified in the genome of A. mediterranei S699, primarily TFs belonging to MarR-, TetR-, Rok-, PadR-, GntR-, HrcA-, IclR-, LuxR-, LysR-, MerR-, AraC-, ArgR-, ArsR-, AsnC-, CsoR-, DeoR-, and WhiB-families were identified. Proteins belonging to the TetR- (232), MarR- (74) and Xre-family (74) are most abundant. The 30 TFs interacting with RifBGC were found to belong to the Rok-, TetR-, MarR-, LuxR-, Xre- and AsnC-family. It was noted that 2 of these TFs were proteins from RifBGC, namely, RifQ (STRING ID: AMED_0634) and RifZ (STRING ID: AMED_0655) (Fig. 2; Marked blue with red boundary).

RifQ is a TetR type repressor that plays a crucial role in conferring antibiotic resistance to the bacteria [28]. The regulators belonging to this family are associated with the regulation of genes involved in export of small-molecules. RifQ is reported to negatively regulate expression of rifP, a membrane protein that is known to export rifamycin B outside the cell and thus balances the extracellular transport [28]. RifZ (a regulator belonging to the LuxR family) is involved in regulating the transcription of all six operons of the RifBGC and it acts as on/off switch for the synthesis of rifamycin B [29]. Out of the remaining 28 TFs, the highest number of TF (n = 10) were found belonging to the ROK family that act as multiple antibiotic resistance regulators and cAMP regulatory proteins (AMED_7826, AMED_6925, AMED_2585, AMED_3749, AMED_2985, AMED_1153, AMED_6775, AMED_3418, AMED_4310, AMED_5065).

Many regulators (Table 1) belonging to other families (e.g. ROK family) are often involved in primary metabolism like glycolysis or fatty acid metabolism but may also play a role in antibiotic production in bacteria like Streptomyces [30]. The association and interaction of these TFs with RifBGC correlates with rifamycin biosynthesis. Other five regulatory proteins belong to the MarR family. Proteins of this family are known to regulate the activity of genes involved in stress response, virulence, and degradation or export of harmful chemicals such as phenolic compounds and antibiotics [31]. MarR related TFs were also found to interact with RifO. This protein play an important role in export of rifamycin interacting with RifP. Further, eight TFs belongs to the TetR family (AMED_1709, AMED_1248, AMED_2021, AMED_6797, AMED_5356, AMED_5377, AMED_5358, AMED_1347), two to the LuxR family (AMED_8119, AMED_5173), two to the IclR family (AMED_3418, AMED_4412) one to the Xre family and one to the Sarp family (AMED_8118, AMED_4442, AMED_4014) (Fig. 3a). Members of the Xre-family are also involved in stress related regulation processes in a cell. Another local regulator, AsnC (AMED_2205) was found to interact specifically with RifQ.

Network Analysis

In order to understand and unveil the role of the TFs in rifamycin production, interaction map was created for 70 proteins including rifamycin BGC and interacting TFs and visualized using Cytoscape v3.7.1. The network constructed obeyed power law, \(P\left(k\right)\sim {k}^{-\gamma }\) indicating the hierarchical nature of the network (Fig. 3b). Degree analysis was carried out in order to check for maximum interaction of the RifBGC node with TF. The predicted network was of tight topology having 70 nodes and 215 edges. Network analysis predicted the order of degree (interaction capacities) of all the TFs (n = 30) with interacting RifBGC proteins (n = 42) for their direct interactions. Two types of hubs were identified, namely, Rif level and TF level hubs Out of 42 RifBGC proteins RifN and Rif-ORF3 were found to interact with highest degree of 10 and 4 TFs respectively at Rif level (Fig. 3c). RifN is known to be involved in converting the amino analogues to aromatic starter unit, 3-amino-5-hydroxybenzoic acid (ABHA). Thus, it acts as a hub to control regulation of other genes in the cluster. Looking into the TF level degree analysis, the highest interacting TFs were AMED_4179 that is a DNA binding protein and found to interacting with RifR, RifC and RifD. Hub proteins are reported to be important for cell growth as well as regulation [26, 32,33,34]. Absence of the expression of these proteins may disrupt the cluster completely. Thus, these proteins may have very crucial role in regulating rifamycin cluster as a whole and also can be explored during better analog productions of rifamycin B.

Fig. 3
figure 3

a The interaction map of rifamycin BGC (brown) along with their interacting transcription factors (green) [Node:70 and Edges: 215]. The ring represents the functional enrichment of each protein description of which are mentioned in the legend. b The topological properties of probability of degree distribution P(k) of the network depicting with correlation coefficient values (r2). The network follows the power law distribution and show the nature of scale-free network, suggesting a hierarchical organization in the network. c Bar graph showing the degree values for rifamycin genes interacting with number of transcription factors, where X-axis denote the rifamycin BGC genes and Y-axis shows number of TF’s they are interacting

Conclusions

10% of the genome of the rifamycin producer Amycolatopsis mediterranei S699 consist of genes encoding TFs. 30 unique transcription factors are directly involved in regulating rifamycin production and its extra-cellular export. These transcription factors may be involved in feedback mechanisms or act as complex allosteric regulators but need further investigation. RifN is the hub protein regulating the RifBGC-TF network. These findings are very important and open the possibility to improve the production of rifamycin B by Amycolatopsis mediterranei S699 and its analog, 24-desmethylrifamycin B by exploiting and manipulating the expression levels of genes encoding these regulators proteins.