Introduction

Steroids are biological molecules which fulfill essential physiological functions in higher organisms. These compounds are widespread in nature being synthesized in plants, insects, vertebrates, and lower eukaryotes. Steroid drugs are widely used in different fields of medicine as anti-inflammatory, diuretic, contraceptive, antiandrogenic, progestational, and anticancer agents, male and female sexual hormones, etc. The ability to modify steroids is known for a wide number of microorganisms, and actinobacteria are considered among the most effective biocatalysts.

The phylum Actinobacteria is composed of Gram-positive bacteria with an overall high G+C content. These bacteria are known to effectively assimilate hydrophobic compounds and possess diverse enzyme systems with high activity towards different organic compounds, including steroids. Steroid catabolism has been characterized best for the representatives of suborder Corynebacterineae, such as Mycobacteria, Rhodococcus, and Gordonia, which characterized by high content of mycolic acids in the cell envelope, the so-called mycolic acid rich organisms (e.g. van der Geize et al. 2007; Uhia et al. 2012; Wilbrink 2011). The known steroid catabolism pathways by these bacteria are presented in Fig. 1 (see “Results and discussion” for detailed explanation). Modification of sterol metabolic pathways by actinobacteria is considered as an effective approach to produce valuable compounds for steroid pharmaceutical industry (Donova and Egorova 2012).

Fig. 1
figure 1

Biochemical scheme of sterol catabolism. Genes coding respective proteins are denoted. a Modification of 3β-ol-5-ene to 3-keto-4-ene moiety in A-ring of steroid core; b degradation of sterol side chain: b.1 degradation of the branched side chain to C24-stenoic acid via aldolic cleavage; b.2 degradation of unbranched side chain to C24-stenoic acid via β-oxidation route; b.3 degradation of C24-stenoic acid to C19-steroids; c steroid core modifications; d steroid core degradation via 9(10)-seco pathway. I sterols, II stenones. R1 = H, cholesterol, cholestenone; R1 = CH3, campesterol, campestenone; R1 = C2H5, sitosterol, sitostenone; R= H, campesterol; R= CH3, sitosterol; III androst-4-ene-3,17-dione (AD), IV androsta-1,4-diene-3,17-dione (ADD), V 9α-hydroxy-AD, VI unstable 9α-hydroxy-ADD, VII 3‐hydroxy‐9,10‐secoandrosta‐1,3,5(10)‐triene‐9,17‐dione (3‐HSA), VIII 3,4‐dihydroxy‐9,10‐secoandrosta‐1,3,5(10)‐triene‐9,17‐dione (3,4‐DHSA), IX 4,5-9,10-diseco-3-hydroxy-5,9,17-trioxoandrosta-1(10),2-diene-4-oic acid (4,9-DSHA), X, 2-hydroxyhexa-2,4-dienoic acid (2-HHD), XI, 9,17‐dioxo‐1,2,3,4,10,19-hexanorandrostan‐5‐oic acid (DOHNAA) or 3aα-H-4α-(3′-propanoic acid)-7aβ-methylhexahydro-1,5-indadione (HIP), XII, 4-hydroxy-2-oxohexanoic acid, XIII pyruvate, XIV 9,17‐dioxo‐1,2,3,4,10,19-hexanorandrostan‐5‐oyl-CoA, XV 9-hydroxy-17-oxo‐1,2,3,4,10,19-hexanorandrostan‐5‐oic acid or 5-OH-HIP, XVI 9-hydroxy-17-oxo‐1,2,3,4,10,19-hexanorandrostan‐5‐oyl-CoA, XVII 9-hydroxy-17-oxo‐1,2,3,4,10,19-hexanorandrost-6-ene‐5‐oyl-CoA, XVIII 7,9-dihydroxy-17-oxo‐1,2,3,4,10,19-hexanorandrostan‐5‐oyl-CoA, XIX 9-hydroxy-7,17-dioxo‐1,2,3,4,10,19-hexanorandrostan‐5‐oyl-CoA, XX 9-hydroxy-17-oxo‐1,2,3,4,5,6,10,19-octanorandrostan‐7‐oic acid. Adopted from: Fujimoto et al. (1982), van der Geize et al. (2007), Nesbitt et al. (2010), Horinouchi et al. (2012), Wilbrink et al. (2012, 2013), Carere et al. (2013) and Yang et al. (2014, 2015)

However, the information on genes involved in steroid catabolism in other steroid-transforming actinobacteria, such as representatives of suborder Propionibacterineae belonging to Nocardioides genus is scarce. Meanwhile, there is a growing body of knowledge on the bioconversion of different type steroids by these bacteria evidencing their broad catalytic potential towards steroids (e.g. Yu et al. 2007; Suzuki et al. 2007; Luthra et al. 2015).

Nocardioides simplex VKM Ac-2033D is an important biotechnological strain capable of performing one of the most important reactions in the synthesis of valuable steroid hormones, introduction of ∆1-double bond in steroid core of 3-oxosteroids. Other biotechnologically relevant activities of the strain include hydrolysis of acetylated steroids, reduction of carbonyl groups at C-17 and C-20 of androstanes and pregnanes, respectively, as well as degradation of sterols and androstanes (Fokina et al. 2003a, b, 2010; Fokina and Donova 2003; Sukhodolskaya et al. 2010). Despite the long history of the strain application for steroid bioconversions, little is known about the genes encoding key steps of steroid catabolism.

The strain had been originally isolated from soil and in the early works referred as “Mycobacterium sp. 193” (Krassilnikov et al. 1959), “Mycobacterium globiforme” (Lestrovaya et al. 1965), “Arthrobacter globiformis” (e.g. Koshcheyenko et al. 1983; Arinbasarova et al. 1996). It had been later re-identified (Fokina et al. 2003a) as Nocardioides simplex (O’Donnell et al. 1982; Validation list 1983) based on the 16S rRNA gene sequence analysis and results of comparative studies of chemotaxonomic, morphological and physiological characteristics. The strain had been deposited in All-Russian Collection of Microorganisms at the Institute of Biochemistry and Physiology of Microorganisms, Russian Academy of Sciences (VKM IBPM RAS) under the No. VKM Ac-2033D.

Recently, we have announced a complete genome sequencing of N. simplex VKM Ac-2033D (Shtratnikova et al. 2015). The genome of over 5.6 Mb with high G+C-content (72.66 %) comprised over 5420 protein-coding sequences distributed across a circular chromosome. In this study, we analyzed genome sequences by means of bioinformatics to predict a suite of genes, operon structures and clusters of the genes involved in sterol/steroid metabolism, and their regulation.

Cholesterol utilization in mycobacteria is controlled by two TetR-type transcriptional repressors, KstR and KstR2 (Kendall et al. 2007, 2010). No other regulators of steroid catabolic pathways in actinobacteria were reported to date. We calculated de novo binding motifs of KstR-regulators in N. simplex VKM Ac-2033D, and searched for binding sites (KstR binding sites) in the whole genome to disclose putative candidates for steroid catabolism-associated genes that differ from the known genes.

The knowledge of its genome organization will contribute to the understanding of steroid bioconversion features, and may also extend the biocatalytic applications of the strain. Genome investigations including comparative analysis of the steroid catabolism-associated genes for other steroid-transforming representatives of the family Nocardioidaceae, or any closely related actinomycetes were hitherto unreported.

Materials and methods

Strain, its cultivation and genome sequencing

The strain of Nocardioides simplex VKM Ac-2033D was cultured as described in (Fokina et al. 2003a). Mineral medium was composed of (g/l): KH2PO4—0.5, K2HPO4—0.5, (NH4)2HPO4—1.5, MgSO4—0.2, FeSO4—0.005, ZnSO4—0.002. β-Sitosterol (S&D Chemicals, England) and cholesterol (Serva, Germany) were added as fine powders (1 g/l). In some experiments, the medium was supplemented with methyl-β-cyclodextrin (MCD), 17 g/l, (CAVASOL® W7 M, Wacker-Chemie, Germany). Cultivation was carried out aerobically on a rotary shaker at 30 °C for 120 h. Steroids were assayed as described in (Ivashina et al. 2012).

For RNA extraction the mineral medium was supplemented with (g/l): yeast extract (Difco, USA)—1.0, glycerol—5.0, MCD—3.46. Cholesterol was added to a final concentration of 0.2 g/l. Cells were harvested when the culture reached OD600 = 0.81.

The sequencing and assembly procedures were described in (Shtratnikova et al. 2015).

Genome analysis, annotation and operon identifying

The annotation of the genome was carried out using online service XBASE (http://www.xbase.ac.uk/annotation/), GenBank tool PGAP and on-line service RAST (http://rast.nmpdr.org/). The whole genome sequences of Nocardioides sp. JS614, Mycobacterium tuberculosis H37Rv, Mycobacterium smegmatis mc2155 were used as references for XBASE. The complete genome sequence in GenBank: CP009896. Operons with found genes were calculated with internet-service FgenesB (http://linux1.softberry.com/berry.phtml?topic=fgenesb&group=programs&subgroup=gfindb).

Analysis of homologous relations and comparison with other Nocardioides strains

Orthologous and paralogous relations between genes of N. simplex VKM Ac-2033D and other actinobacteria, as well as different Nocardioides strains were analyzed using OrthoMCL 2.09 (Li et al. 2003). For comparison, the genomes of 19 Nocardioides strains were chosen that were well assembled, i.e. a genome was assembled in less than 100 contigs. Proteins of the selected strains (Table 1 for actinobacteria, Table S1 for Nocardioides) were clustered into orthologous groups (orthogroups) with an OrthoMCL inflation parameter 1.5. The list of all orthogroups is presented in Table S2 for actinobacteria, list of orthogroups of steroid catabolism genes for Nocardioides genus is presented in Table S3. To find out putative genes encoding enzymes of steroid catabolism out of orthogroups, BLASTx search was run (Altschul et al. 1990) in actinobacteria (Table 1) and betaproteobacterium Comamonas testosteroni. Reciprocal BLAST was used in several cases for searching genes which corresponded to known genes one-to-one.

Table 1 Actinobacterial strains for orthogroups construction

Analysis of transcription factor binding sites

The regions 500 bp upstream plus 50 bp downstream were analyzed with respect to start codons of genes putatively involved in steroid metabolism. Over-represented motifs in these regions using MEME 4.10 (Bailey and Elkan 1994) were searched. Motifs were allowed to be from 8 to 50 bp long. 20 Top-scoring motifs in a form of position-weight matrices (PWM’s) were compared with the respective motifs of mycobacterial steroid metabolism regulators KstR (Kendall et al. 2007) and KstR2 (Kendall et al. 2010). Two motifs that were clearly similar to the motifs of KstR and KstR2 of mycobacteria are further referred to as motifs of KstR and KstR2 in N. simplex VKM Ac-2033D.

Then, to seek for all sites of KstR and KstR2 in N. simplex VKM Ac-2033D genome the regions 500 bp upstream plus 50 bp downstream were scanned with respect to start codons of all N. simplex VKM Ac-2033D genes with KstR and KstR2 motifs found in the previous step. The scan was performed by a tool FIMO from MEME suite 4.10 (Grant et al. 2011). Motifs with false discovery rate lower than 0.01 (q value, estimated by FIMO using Benjamini-Hochberg technique) were considered as putative binding sites. For comparison, the same regions were also scanned with UGENE 1.13.1 (Okonechnikov et al. 2012) using respective sites of KstR and KstR2 from mycobacteria, considering motifs with score no less than 85 % of maximum possible score. Binding sites which are present in the results of both tools were used for analysis. Determining of the KstRs binding sites was used for operon prediction.

RNA extraction

Total RNA was extracted from 4 ml of bacterial suspension (CFU 6.8 × 108 ml−1) using RNeasy Mini Kit (Qiagen, USA) according to the manufacturer’s protocol following by treatment with RNase free DNase I (Thermo Scientific, Lithuania). Total RNA was quantified at 260/280 nm using the NanoPhotometerTM P-Class (Implen, Germany) and stored at −70 °C.

Real-time quantitative polymerase chain reaction

First-strand cDNA was synthesized using Revert Aid H minus first strand cDNA kit (Thermo Scientific, Lithuania) with 2 μg of total RNA following the manufacturer’s instructions. A real-time PCR was performed using a DNA-Technology system (Russia) with qPCRmix-HS SYBR (Evrogen, Russia). The nucleotide sequences of primers used in this study for target genes and reference gene 16S rRNA are listed in Table S4. Each sample was run in duplicate and the experiment was performed in duplicate. Reaction mix of total volume 25 μl contains 2 μl of cDNA, 0.4 μM of each primer and 5 μl of qPCRmix-HS SYBR. The amplification was performed as follows: initial denaturation at 94 °C for 4 min 30 s, followed by 40 cycles of denaturation at 95 °C for 20 s, annealing at 53 °C for 20 s and extension at 72 °C for 20 s. To confirm that no genomic DNA was present in the RNA samples, cDNA was substituted with 2 μl of reaction mixture prepared without the M-MuLV reverse transcriptase. The 16S rRNA gene was used as internal control. The relative gene expression is represented by average crossing point (∆Cp) values. The specificity of the PCR products was verified by electrophoresis on a 1.2 % agarose gel.

Results and discussion

Determination of binding motifs of KstR and KstR2

Binding motifs of KstR and KstR2, predicted in N. simplex VKM Ac-2033D de novo, nearly coincides with those in mycobacteria. Binding preferences of KstRs for N. simplex VKM Ac-2033D and for mycobacteria are displayed in Fig. 2. Other overrepresented short sequences near the steroid catabolism-related genes were of low complexity and, therefore, it is unlikely that they correspond to binding sites of some transcription factors.

Fig. 2
figure 2

Comparison of KstR and KstR2 binding motifs of N. simplex VKM Ac-2033D and mycobacteria. Motifs for mycobacteria are calculated based on sites reported for M. tuberculosis and M. smegmatis in (Kendall et al. 2007, 2010). More conservative positions are indicated by higher letters. Diagrams were made by means of Weblogo [(Crooks et al. 2004); weblogo.berkeley.edu]

General clustering of steroid catabolic gene homologs

During this study, we observed the clustering of predicted operons with orthologs of known genes involved in steroid catabolism. Three large clusters have putative KstRs binding sites: cluster A (KR76_14160-KR76_14505), cluster B (KR76_12190-KR76_12345), and cluster C (KR76_25015-KR76_25200) (Figs. 3, 4).

Fig. 3
figure 3

General scheme of the genome with orthologs of steroid catabolism genes. Stripes, separate predicted operons. Black stripes, predicted operons with no known binding sites. Blue stripes, predicted operons with putative KstR binding sites. Red stripes, predicted operons with putative KstR2 binding sites. Green diamonds, predicted mce-loci. A, B, C, D, E, clusters of operons with genes of proteins related to steroid catabolism. Circles, predicted operons with putative binding sites of either KstR or KstR2 and with genes not known to be related to steroid catabolism

Fig. 4
figure 4

Clusters of genes coding for steroid metabolism related proteins. Gene order to be read from top left to bottom right. Vertical lines designate regions where several genes are omitted. Names of genes belonging to the same operon are written close to each other; neighboring operons are separated by a larger space. Filled rectangles denote putative KstR binding sites; empty rectangles denote putative binding sites of KstR2

Cluster A (Figs. 3, 4) contained candidate genes related to the degradation of steroid core rings A/B, and presumably, to the oxidation of ring C and 17-acyl side chain. It was very similar to the clusters described for other actinobacteria both in terms of the set and the order of genes: cluster ro04482-ro04706 in R. jostii RHA1 (van der Geize et al. 2007); MSMEG_5903-MSMEG_5943 in M. smegmatis (Uhía et al. 2012); Rv3574-Rv3492c, which was revealed in M. tuberculosis H37Rv and M. bovis Bacillus Calmette‐Guérin (BCG) (Wilbrink 2011), as well as in gram-negative bacterium C. testosteroni TA441 (Horinouchi et al. 2004). Cluster A matches to a part of R. jostii cluster ro04531-ro04694. It is noteworthy that the kshA gene is absent in the N. simplex VKM Ac-2033D genome between the operons bphE3hsaGF and hsaADCB, but it is present between similar operons in R. jostii RHA1 (Wilbrink 2011). Nonetheless, an ortholog of kshA having KstRs binding sites is situated in cluster A within the operon which is similar to MSMEG_5925_5927.

In the N. simplex VKM Ac-2033D genome, cluster B (Figs. 3, 4) contains candidate genes related to a side chain degradation pathway. These genes match to the group ro04482-ro04488 of the main RHA1 cluster (Wilbrink 2011), as well as to the cluster 1 of M. smegmatis (Uhía et al. 2012).

Most genes in clusters A and B have putative KstR binding sites, while the genes of cluster C (Figs. 3, 4) seem to be under the control of KstR2. It was reported that KstR2-regulon encoded genes involved in rings C and D degradation (Casabon et al. 2013b). In N. simplex VKM Ac-2033D, the predicted KstR2-regulon is situated in cluster A and comprises several genes matched to the genes of such a regulon in M. smegmatis and M. tuberculosis (Kendall et al. 2010). A predicted regulon of cluster C has no counterparts among the clusters in the annotated genomes of other actinobacteria. It includes candidate genes of cytochrome P450, oxidoreductases, short-chain dehydrogenase/reductase SDR, 3α(20β)-hydroxysteroid dehydrogenase, enoyl-CoA hydratases, acetyl-CoA acetyltransferase fadA3, and acyl-CoA synthetase fadD3. It is possible that the regulon encoding the degradation of rings C/D is split in N. simplex VKM Ac-2033D, or the strain has several degradative pathways.

Several clusters do not have putative KstRs binding sites but includes orthologs of the known steroid catabolic genes. Among them, the clusters D (KR76_27035-KR76_27130) and E (KR76_17995-KR76_18070) may be of interest (Figs. 3, 4). Their genes could be controlled by non-KstR regulators. A group of predicted operons within cluster D included orthologs of ksh subunits, two kstDs, gene of steroid delta-isomerase and some predicted transcriptional regulators. Cluster E includes orthologs of genes of cholate degradation cluster of R. jostii RHA1 (Mohn et al. 2012).

Sterol utilization

Nocardioides simplex VKM Ac-2033D utilized cholesterol and sitosterol as carbon and energy sources (Fig. 5a). No, or poor growth was observed in blanks (mineral medium only, or mineral medium supplemented with MCD only). The rate of consumption was higher in the presence of MCD which is known to enhance bioconversion process (Jadoun and Bar 1993). Androstenedione and androstadienedione were detected as intermediates in low amounts (Fig. 5b). The results confirmed the ability of the strain to degrade sterol molecule.

Fig. 5
figure 5

Time-courses of cholesterol, sitosterol (a) and cholest-4-en-3-one and sitost-4-en-3-one (b) during N. simplex VKM Ac-2033D growth in sterol-mineral medium

Genes encoding sterol uptake system

Diverse actinobacteria contain mce transporter operons involved in steroid uptake (Mohn et al. 2008; García et al. 2012). The Mce4 system in R. jostii RHA1 may recognize only side-chains of at least eight carbons (Mohn et al. 2008; García et al. 2012) and included ATPase, two permease subunits (YrbE or Sup), and proteins of unknown function (Mce4ABCDEFHI), closely spaced, transcribed and regulated as one operon under negative regulation by KstR (Mohn et al. 2008). The mce4 locus expression was induced on cholesterol, but not strongly, probably because of the high basal expression (van der Geize et al. 2007; Mohn et al. 2008; Uhía et al. 2012).

In this study, six predicted mce loci were revealed in N. simplex VKM Ac-2033D by BLAST. In comparison, two and four mce loci had been found in R. jostii RHA1 (Mohn et al. 2008), and M. tuberculosis H37Rv (Wilbrink 2011), respectively. Many genes of predicted mce-loci do not have orthologs in calculated orthogroups, because of the possible recent horizontal transfer of these genes, or their fast evolution. Only one predicted operon (KR76_12195-KR76_12240) in cluster B (Fig. 4) possesses a putative KstR box (Table S5). Although a predicted operon KR76_01200-KR76_01230 does not possess a KstRs binding sites, it may be of interest since it contains the gene choD encoding cholesterol oxidase. A similar operon was recently found in Gordonia cholesterolivorans with a constitutively transcribed gene encoding ChoOx-2 thus indicating its possible contribution to steroid transport (Drzyzga et al. 2011).

Cholesterol oxidase and 3-hydroxysteroid dehydrogenase/Δ5-isomerase

The accumulation of cholest-4-en-3-one, or sitost-4-en-3-one was observed during N. simplex VKM Ac-2033D incubation with cholesterol, or sitosterol, respectively (Fig. 5); thus evidence modification of 3β-ol-5-ene to 3-keto-4-ene moiety in A-ring of steroid core. This reaction is considered as an initial step of sterols oxidation by actinobacteria (Fig. 1a). In many organisms, the enzymes accounting for 3β-hydroxy dehydrogenation and Δ5→4 isomerization are cholesterol oxidases (ChOs) and/or 3β-hydroxysteroid dehydrogenases (3-HSDs) (Yam et al. 2009). Except for ChoM1 and ChoM2 (Yao et al. 2013), cholesterol oxidases are not critical enzymes for cholesterol utilization in mycobacteria and rhodococci (Griffin et al. 2011; Ivashina et al. 2012; Uhía et al. 2012). Herein, only the cholesterol oxidase ortholog KR76_09550 posesses a putative KstR binding site (Table S1), while paralogs of ChoD KR76_01200, KR76_06800 and KR76_13200 do not. Surprisingly, no genes with high identity to the 3-HSDs known have been revealed in the genome of N. simplex VKM Ac-2033D. It should be noted that in some cases 3-HSDs may not be essential for steroid metabolism (Casabon et al. 2013b). Taking into consideration the high efficiency of cholesterol oxidation by N. simplex VKM Ac-2033D, one may propose that KR76_09550, or other unidentified enzymes may be responsible for this functionality in this organism, or this reaction is not controlled by KstR regulators.

Both ChOs and 3-HSDs were shown to have dual functions: they catalyze both the oxidation of 3β-hydroxyl group and Δ5→4 isomerization of 3β-ol-5-ene-steroids. In some proteobacteria (Comamonas testosteroni, Pseudomonas putida) distinct enzymes were reported to be responsible for izomerization reactions. ∆5-3-Ketosteroid isomerase encoded by ksi accounted for double bond isomerization in C. testosteroni (Horinouchi et al. 2012) (Fig. 1a). In the N. simplex VKM Ac-2033D genome, two copies of candidate genes ksi (ksdI) of one orthogroup (Table S2) are present, one of them, KR76_23530, having a putative KstR box (Table S5). The putative ksi gene KR76_27120 identical to that of A. simplex (Molnár et al. 1995) was detected in a cluster D (Fig. 4) within the same predicted operon with kstD. Most probably, KR76_27120 is either not functional, or under the regulation of transcriptional factors differing from KstR. This assumption correlates with a very low expression level of similar genes in A. simplex and S. lividans (Molnár et al. 1995). The separate ksi gene KR76_23530 might be a functional one.

Genes coding for sterol side chain degradation

In mycolic acid rich actinobacteria, the pathway of the side-chain degradation of cholesterol is generally similar to the fatty acids β-oxidation (Capyk et al. 2009b; Nesbitt et al. 2010; Thomas and Sampson 2013) (Fig. 1b.1, b.2). It is initiated by hydroxylation at C26(27), which is catalyzed by the cytochrome P450 monooxygenase Cyp125 (Rosłoniec et al. 2009; Capyk et al. 2009b; Wilbrink 2011). As evidenced for M. tuberculosis and M. bovis, cytochrome Cyp142 is capable of replacing the activity of Cyp125 (Johnston et al. 2010). Cyp125 was shown to be involved in the further oxidation of 26-hydroxycholest-4-en-3-one to cholest-4-en-3-one-26-oic acid (Ouellet et al. 2010). In M. smegmatis, the expression of gene MSMEG_5995 cyp125 was induced with cholesterol and the promoter had a KstR box (Uhía et al. 2012). In this study, the predicted operon in cluster A includes the gene KR76_14335 which is in orthogroup of cyp125 (Tables S2, S5). Another predicted operon with a putative KstR binding site far from major clusters includes orthologs of cyp123 (KR76_04935) and cyp51 (KR76_04945) (Tables S2, S5). We did not find any genes highly identical to cyp124, or cyp142 which had been shown to provide functional redundancy in M. tuberculosis (Johnston et al. 2010).

One of the genes of β-oxidation route, fadD19, codes for a steroid-CoA ligase which is essential in R. rhodochrous DSM43269 for degradation of the C24-branched sterols (Fig. 1b.1) i.e. sitosterol, stigmasterol and others (Wilbrink et al. 2011). In M. smegmatis, FadD19 was shown to be induced with cholesterol (Uhía et al. 2012). We revealed an ortholog of fadD19 KR76_14210 with a putative KstR box situated within a cluster A in the N. simplex VKM Ac-2033D genome (Tables S2, S5).

Acyl-CoA-dehydrogenases encoded by fadE26 (renamed as chsE4) and fadE27 (renamed as chsE5) forms a heterotetramer with two dimers of ChsE4-ChsE5 with one active site each (Yang et al. 2015). ChsE4-ChsE5 preferentially catalyzes the oxidation of 3-oxo-cholest-4-en-26-oyl CoA in the first cycle of cholesterol side chain β-oxidation, but complex can catalyze the oxidation of 3-oxo-chol-4-en-24-oyl CoA as well as 3-oxo-4-pregnene-20-carboxyl-CoA (Fig. 1b.3). These genes are situated within the operons which are up-regulated on cholesterol (van der Geize et al. 2007; Uhía et al. 2012). We found one predicted operon with the ortholog of ChsE4-ChsE5 (KR76_14175- KR76_14180) in N. simplex VKM Ac-2033D with a putative KstR binding site in the cluster A (Tables S2, S5).

Steroid-24-oyl-CoA synthetase encoded by fadD17 appears to catalyze the ligation of CoA in process of β-oxidation of steroids with C5 side chain (Fig. 1b). FadD17 was not essential for growth of M. tuberculosis on cholesterol (Griffin et al. 2011), but its homolog was up-regulated on cholate in R. jostii RHA1 (Mohn et al. 2012). The enzyme was proposed to have additional functions, e.g. catalysis of CoA thioesterification of long-chain fatty acids (Casabon et al. 2014). In the genome of N. simplex VKM Ac-2033D, ortholog of fadD17 KR76_14185 was detected within cluster A and had a putative KstR box (Tables S2, S5). Gene KR76_18025 orthologous to gene of isofunctional acyl-CoA-synthetase CasG of R. jostii RHA1 (Mohn et al. 2012) was found in cluster E without a KstR binding site (Tables S2, S5, S6). Ortholog of gene of steroid-22-oyl-CoA synthetase CasI of R. jostii RHA1 (Mohn et al. 2012; Casabon et al. 2014) (KR76_13775) without a KstR box was found in N. simplex VKM Ac-2033D to be situated separately from main clusters (Tables S2, S5, S6). Probably, under certain conditions the products of these genes may contribute to the degradation pathways of steroids with short side chains.

FadE34 (ro04483), renamed as ChsE3, catalyzes only the oxidation of 3-oxo-chol-4-en-24-oyl CoA in the second cycle of β-oxidation (Yang et al. 2015; Ruprecht et al. 2015). In the genome of N. simplex VKM Ac-2033D its ortholog KR76_12295 is located in the same cluster B under a putative KstR binding site (Tables S2, S5).

Degradation of C24-branched chain sterols was suggested to occur via aldolytic cleavage (Fig. 1b.1) and requires aldol lyases encoded by ltp3 and ltp4 (Wilbrink et al. 2012). In N. simplex VKM Ac-2033D, one predicted operon was identified with orthologs of ltp3 (KR76_14285) and ltp4 (KR76_14280) genes in cluster A with a putative KstR box (Tables S2, S5).

A role in side chain degradation was proposed for enoyl-CoA-hydratases encoded by echA19 (van der Geize et al. 2007) and hsd4B (Uhía et al. 2012) because of their location proximal to other genes coding for side chain degradation, as well as of its up-regulation on cholesterol and a presence a KstR box in the promoter. In the case of N. simplex VKM Ac-2033D, ortholog echA19 KR76_14215 alone and ortholog hsd4B KR76_14505 in predicted operon with ortholog of kstD3 are located in cluster A, both predicted operons possess putative KstR binding sites (Tables S2, S5).

The Hsd4A might be involved in the degradation of cholesterol side chain, and also may act as 17β-hydroxysteroid dehydrogenase and D-3-hydroxyacyl-CoA dehydrogenase in the presence of branched fatty acids (Uhía et al. 2012). In the genome of N. simplex VKM Ac-2033D, the hsd4A ortholog KR76_12260 was revealed in cluster B within predicted operon with a putative KstR box (Tables S2, S5).

Several studies are devoted to an igr-locus of M. tuberculosis, which is essential for virulence. The operon consists of lipid transfer protein (ltp2/Rv3540c), 2 MaoC-like hydratases (chsH1/Rv3541c and chsH2/Rv3542c), 2 acyl-CoA dehydrogenases (fadE29/chsE2/Rv3543c and fadE28/chsE1/Rv3544c), and cytochrome P450 (cyp125/Rv3545c) (Thomas et al. 2011). The FadE29 (renamed as ChsE2) and FadE28 (ChsE1) form heterotetramer with two active sites that catalyzes 3-oxo-4-pregnene-20-carboxyl-CoA oxidation (Thomas and Sampson 2013), ChsH1-ChsH2 catalyzes the hydration of a 3-oxo-4,17-pregnadiene-20-carboxyl-CoA (Yang et al. 2014) (Fig. 1b.3). The strain of N. simplex VKM Ac-2033D possesses a similar predicted operon KR76_12325- KR76_12345 in cluster B with a putative KstR box (Fig. 4). The genes orthologous to chsE2, ltp2, chsH1 and chsH2 were detected within the operon (Tables S2, S5). The product of gene chsE2 in M. tuberculosis was shown to function only in a complex with the product of chsE1, and the genes in operon were adjacent to each other (Thomas and Sampson 2013). We could not find any chsE1 homologs in the genome of N. simplex VKM Ac-2033D. It is unclear therefore, whether the product of chsE2 has another function and operates differently, or it is not functional in N. simplex VKM Ac-2033D.

Genes encoding 3-ketosteroid dehydrogenase and 9α-hydroxylase

At present, the only steroid core degradation pathway is known for actinobacteria and some proteobacteria, the so-called 9(10)-seco pathway (e.g. Donova and Egorova 2012). It includes the formation of chemically unstable 1,4-diene-9α-hydroxy structure followed by A-ring aromatization (Fig. 1c) and further multistep core degradation (Fig. 1d). The key enzymes accounting for the opening of B-ring with 9(10)-secosteroid formation are 3-ketosteroid ∆1-dehydrogenase (KstD) and 3-ketosteroid-9α-hydroxylase (Ksh) (Petrusma et al. 2011; Capyk et al. 2011).

3-Ketosteroid ∆1-dehydrogenases (KstDs) are FAD-dependent enzymes which catalyze the trans-diaxial elimination of the C-1(α) and C-2(β) hydrogen atoms of the steroid A-ring (Itagaki et al. 1990). N. simplex VKM Ac-2033D and other actinobacteria are capable to introduce a Δ1-double bond into a wide range of 1-saturated 3-ketosteroids, but not 3-hydroxysteroids; their strains contains different number of kstDs (McLeod et al. 2006; Knol et al. 2008; Fernández de las Heras et al. 2012; Bragin et al. 2013). In the genome of N. simplex VKM Ac-2033D, we revealed two orthologs of kstD1/ksdD of R. erythropolis, two of kstD3 and one of kstD2 (Tables S2, S5). The candidate gene kstD3 KR76_14500 is adjacent to hsd4B within the predicted operon with a putative KstR binding site in cluster A, while the kstD1 KR76_02655 with a putative KstR box is located far from main clusters. These findings might evidence enzymatic redundancy of KstDs in N. simplex VKM Ac-2033D which is known as one of the most effective strains capable of insertion of 1(2)-double bond into 3-ketosteroids of both androstane and pregnane series (Fokina et al. 2003a, b, 2010; Fokina and Donova 2003; Sukhodolskaya et al. 2010).

3-Ketosteroid 9α-hydroxylase (KSH) is a two-component non-heme monooxygenase which is composed of the oxygenase KshA with Rieske domain and flavin-containing ferredoxin reductase KshB (Capyk et al. 2009a). The number of the genes encoding KshA and KshB varied from 1 to 5 in different bacteria, and the number of kshAs did not always correspond to the number of kshBs (Bragin et al. 2013). In N. simplex VKM Ac-2033D, two genes were orthologous to kshA, and two genes to kshB (Tables S2, S5), one homolog of kshA out of orthogroups was detected in cluster E. In the cluster A both genes kshA and kshB were within the same predicted operon having a putative KstR binding site. Most likely, these genes play a role in sterol catabolism, and their products relate to the core degradation by a 9(10)-seco-pathway.

Genes encoding further steroid core degradation pathway

The activities of 3-KstDs and KSHs result in the formation of intermediate 3‐hydroxy‐9,10‐secoandrosta‐1,3,5,(10)‐triene‐9,17‐dione (3‐HSA) (Fig. 1d). As shown for some actinobacteria and proteobacteria, further degradation pathway includes 3-HSA hydroxylation with flavin-dependent monooxygenase (HsaAB in R. jostii RHA1) (Dresen et al. 2010), oxygenolytic cleavage of ring A with HsaC (or BphC) (van der Geize et al. 2007) followed by the removal of 2-hydroxy-hexa-2,4-dienoic acid (HHD) of the 3aα-H-4α(3′-propanoate)-7aβ-methylhexahydro-1,5-indanedione (HIP) with HsaD (or BphD) (Lack et al. 2010) (Fig. 1d). Genes encoding the pathway in R. jostii RHA1, M. tuberculosis (van der Geize et al. 2007; Dresen et al. 2010) and C. testosteroni (Horinouchi et al. 2004) were studied. Catabolic operon hsaADCB-like in M. smegmatis was up-regulated on cholesterol (Uhía et al. 2012). In N. simplex VKM Ac-2033D, the only one predicted operon hsaADCB is under a putative kstR box in cluster A (Tables S2, S5), orthologs of R. jostii RHA1 genes hsaD3B3C3 without a kstR box are located in cluster D.

The operon hsaEGF with the genes encoding further A-ring degradation was shown to elevate expression during M. smegmatis growth on cholesterol (Uhía et al. 2012). In the genome of N. simplex VKM Ac-2033D, two orthologs of bphE3 of R. jostii RHA1 are present (Table S5) which may relate to ring A catabolism (Carere et al. 2013). In our strain, the predicted operon of hsaFG-bphE3 KR76_14485-KR76_14495 having a putative KstR binding site is in the cluster A, another predicted operon with orthologs bphF3G3E3 (KR76_24170-KR76_24180) without of KstR binding sites is far from main clusters (Tables S2, S5).

It was assumed that the propionate moiety of HIP might be removed via β‐oxidation (Fig. 1d) (van der Geize et al. 2007; Wilbrink 2011).

FadD3 is a HIP-CoA synthetase that initiates catabolism of steroid rings C and D in M. tuberculosis (Casabon et al. 2013a) (Fig. 1d). Orthologs are found in all sterol catabolizing bacteria. The ortholog of fadD3 KR76_25120 with a putative KstR2 box is located within cluster C in the N. simplex VKM Ac-2033D genome (Tables S2, S5).

Genes ipdA and ipdB are supposed to encode a heterodimeric CoA transferase; R. equi deletion mutant was impaired in growth on 5-OH-HIP (van der Geize et al. 2011) (Fig. 1d). In R. jostii RHA1 genome they are located in the group ro04649-ro04656 that up-regulated during growth both on cholesterol and on cholate (Mohn et al. 2012). In N. simplex VKM Ac-2033D, two copies of both orthologs were detected within two predicted operons in cluster A and in cluster C, and both operons had putative KstR2 binding sites (Tables S2, S5).

Acyl-CoA dehydrogenase FadE30 was shown to be responsible for the further dehydrogenation of 3′-propanoate substituent at C4 of the 5-OH-HIP-CoA (van der Geize et al. 2011) (Fig. 1d). In the N. simplex VKM Ac-2033D genome, a predicted operon which includes orthologs of acyl‐CoA dehydrogenase (fadE30) KR76_14430, SDH (short-chain dehydrogenase) KR76_14435 and acetyl-CoA acetyltransferase (fadA6) KR76_14440 is located in a cluster A. The group of acyl‐CoA dehydrogenases encoded by fadE31, fadE32 and fadE33 belongs to a heterotetrameric subfamily of ACAD. The products of these genes were shown to be functional only in each other’s complexes (Wipperman et al. 2013). Their predicted operon KR76_14445-KR76_14455 is located closely to the above predicted operon in the opposite strand in the N. simplex VKM Ac-2033D genome (Fig. 4). Between these operons, two putative KstR2 binding sites are located. A similar group of genes ro04591-ro04599 in the genome of R. jostii RHA1, was up-regulated on cholate, expressed at a high constitutive level on cholesterol and belongs to KstR2 regulon (Mohn et al. 2012).

Genes encoding transcriptional repressors KstR and KstR2

Transcription of genes involved in steroid metabolism is strongly regulated by transcriptional repressors of TetR‐type kstR and kstR2. KstR was proven to control a large regulon consisted of several dozens of genes (Kendall et al. 2007), while kstR2 regulated a smaller regulon comprised of 15 genes (Kendall et al. 2010). The repressors may regulate themselves.

In the genome of N. simplex VKM Ac-2033D, one ortholog kstR KR76_12270 is in a cluster B, though its KstR binding site was estimated with above-threshold false discovery rate (q value of 0.0295). One ortholog kstR2 KR76_25115 is in a cluster C, and had its own putative KstR2 binding site. Reciprocal BLAST confirms their similarity with M. tuberculosis regulators as one-to-one genes. Another TetR-regulator with a putative KstR2 binding site, KR76_07630, is located separately in the genome (Tables S2, S5).

Orthologs casA KR76_18040, kstR3 KR76_18060 and KR76_27035 are detected in clusters E and D (Tables S2, S5). It is reasonable to propose that they may represent regulators that are distinct from KstR and KstR2, and could play a role in other steroid catabolism in N. simplex VKM Ac-2033D.

Expression of KR76_14500, KR76_12295, and KR76_25120 genes in the presence of cholesterol

RT-qPCR experiments were performed to estimate the expression levels of sterol catabolic genes using mRNA from the cells grown in the presence, or absence of cholesterol. Three genes (KR76_14500, KR76_12295, and KR76_25120) were chosen that were localized in three different chromosomal clusters, participated in different stages of sterol catabolic pathway, and predicted to be controlled by transcriptional regulators KstR, or KstR2. All three genes were induced in the presence of cholesterol. The expression of fadD3 KR76_25120 was less responsive to addition of cholesterol (∆Cp = 3.6 ± 0.95). A higher level of expression was observed for kstD3 KR76_14500 and fadE34/chsE3 KR76_12295 (∆Cp = 11.8 ± 0.65 and ∆Cp = 8.3 ± 0.6, respectively). For 16S rRNA gene, selected as endogenous control, no significant differences in expression level were observed regardless of the conditions tested.

Other genes of interest

N. simplex VKM Ac-2033D and related organisms are known to be applied for ∆1-dehydrogenation of 1-saturated pregnane steroids, such as cortisol, to produce therapeutically important 1-dehydroanalogs, or valuable intermediates. The Δ1-dehydrogenation by whole-cell catalysts was often accompanied by the reduction of 20-carbonyl group (Arinbasarova et al. 1985; Mahato et al. 1988; Fokina and Donova 2003). In M. tuberculosis, Rv2002 (FabG3), a NADH-dependent 3α,20β-hydroxysteroid dehydrogenase is likely to be involved in steroid metabolism (Yang et al. 2003). In the genome of N. simplex VKM Ac-2033D, fabG3 ortholog KR76_12265 has a putative KstR box and is located in cluster B (Supplementary material, Tables S1, S2), other paralogs KR76_23000 and KR76_13560 does not have KstR box. The gene KR76_25085, ortholog of 3-alpha-hydroxysteroid dehydrogenase of N. farcinica IFM 10152, was revealed in a cluster C and has a putative KstR box.

A whole genome screening to identify genes with putative binding sites of KstR-regulators in N. simplex VKM Ac-2033D predicted operons with the genes encoded acyl-CoA transferases and hydrolases and other transcriptional regulators (KR76_19215-KR76_19220) that might be involved in the side chain degradation, or steroid core oxidation (Tables S2). Several genes are presumably involved in the metabolism of main nitrogenous bases (KR76_11475-KR76_11490, KR76_19205-KR76_19210, KR76_22540-KR76_22555, KR76_23525, KR76_09920-KR76_09930, KR76_26165-KR76_26180), and sulfate metabolism (KR76_03905-KR76_03920) (Table S2). We could not find genes with a putative KstRs binding sites in N. simplex VKM Ac-2033D which matches with additional genes enhancing their expression on cholesterol in R. jostii RHA1 (Wilbrink 2011), e.g. genes encoding cycloalkanone catabolism ro06687-ro06698, that are suggested to be involved in ring D degradation of HIP (Wilbrink 2011). A few matches were with the genes of KstRs-regulons in M. smegmatis (Uhía et al. 2012) (e.g. AraC family transcriptional regulator, succinic semialdehyde dehydrogenase, helicase, peptidase). It is possible that these additional sets of gene are species-, or strain-specific.

The strain N. simplex VKM Ac-2033D is capable of utilization of lithocholic acid (data not shown). We also searched genome of N. simplex VKM Ac-2033D to find orthologs of genes of cholate catabolism cluster of R. jostii RHA1 (Wilbrink 2011; Mohn et al. 2012) (Table S6). The orthologs are genes of main clusters, e.g. kshA, kstD, fadA5 in cluster A, hsaBCD in cluster D, several single-copy orthologs in cluster E; several orthologs are located separately from main clusters (e.g., casI, casE). So, there is no a cholate degradation-like cluster in N. simplex VKM Ac-2033D, orthologs of its genes might be involved in degradation of bile acids, in other degradative pathways or be non-funcional.

Comparison with other Nocardioides strains

Known for the ability to utilize ethene, vinyl chloride, propene and butene, fluoroethene, and nicotine, Nocardioides sp. JS614 has been proposed to use as a biocatalyst for production of chiral epoxides (Coleman et al. 2011). Nocardioides sp. CF8 can utilize butane, halogenated hydrocarbons; C2 to C16 n-alkanes (Kimbrel et al. 2013). Recently, in a study of a small flowering plant Arabidopsis thaliana microbiome (Bai et al. 2015), 17 Nocardioides genomes were sequenced. Unfortunately, nothing is known about the possibility of these strains to utilize cholesterol.

We analyzed differences between the genome of N. simplex VKM Ac-2033D and the genomes of Nocardioides strains already sequenced and annotated. No steroid catabolism genes or a few genes, but not a whole set were found in Leaf285, Leaf307, Root122, Soil774, Soil777, Soil805. The set of genes putatively involved in steroid catabolism was found in the strains of Nocardioides spp. CF8, JS614, Root1257, Root140, Root151, Root190, Root224, Root240, Root614, Root682, Root79, Soil796, Soil797 thus allowing to presume that these strains could be able to utilize cholesterol or the corresponding steroids.

Similar to mycolic acid rich actinobacteria, N. simplex VKM Ac-2033D harbors different steroid catabolic pathways encoded by distinct clusters of genes under predicted KstR and KstR2-regulation. The strain does not possesses a whole set of enzymes for cleavage of propyl moiety of C23-C21-side chain (none orthologs of chsE1 (fadE28) have been found). Nonetheless, the strain is able to degrade cholesterol. It is reasonable to assume an existence of unknown enzymes, or additional mechanisms of sterol side chain degradation which may comprise genes of a new predicted regulon. This suggestion needs a proof from transcriptomic, proteomic and enzymologic studies.

Genes described in this investigation could be used for gene modifying in biotechnological attempts to improve industrial properties of microorganisms in steroid production.

Conclusions

In this work, the organization of steroid metabolism associated genes in the genome of important biotechnological strain, Nocardioides simplex VKM Ac-2033D, was first studied. Predicted genes related to sterol uptake system, aliphatic side chain degradation at C17, A/B-, and C/D-ring degradation systems are distributed across one circular chromosome and form several major clusters. The distinctive features of the strain genome include the presence of a predicted regulon of KstR2 having no analogs in other annotated genomes of actinobacteria, an absence of genes encoding 3-hydroxysteroid dehydrogenase and absence of chsE1 (fadE28) orthologs. Future transcriptomic studies are of significance for clearer understanding of the candidate genes’ importance for steroid modifications by the organism.

The presented study advances our understanding of steroid metabolism by actinobacteria. The results suggest that steroid catabolic pathways are generally conserved in actinobacteria, but apparently additional system for steroid oxidation may exist. Obviously, biocatalytic potential of Nocardioides and relative actinobacteria toward steroids is not fully disclosed, and the knowledge of steroid catabolic pathways in these organisms will open novel perspectives at the creation of effective biocatalysts for steroid pharmaceutical industry.