1 Introduction

Bacterial proteins belonging to the YczE family [1] are predicted to be membrane proteins of yet unknown function possessing five trans-membrane helices. The Conserved Domain Database (CDD) [2] assigns code COG2364 to the YczE family. These proteins share the presence of one or two so-called DUF161 domains with the YitT family, corresponding to InterPro databank [3] code IPR003740, Pfam [4] code PF02588, and CDD code COG1284. Indeed, YczE and YitT proteins are generically annotated as “membrane proteins containing the DUF161 domain”. Differences between YczE and YitT proteins are not obvious by the annotations reported in the databanks. Possibly, they may be distinguished by the absence of the C-terminal DUF2179 domain in YczE, corresponding to the Protein Data Bank (PDB) structure PDB:3HLU, present in the members of the YitT family. Moreover, in many bacterial species, the yczE gene coding for the YczE protein is divergently transcribed with respect to an adjacent gene coding for a transcriptional regulator of the MocR family [5]. The MocRs linked to the yczE genes, named YczRs, are predicted to constitute a subfamily within the MocR family [6]. Moreover, YczRs are supposed to regulate the expression of the yczE genes as, for example, in Regulog PdxR3—Mycobacteriaceae of the RegPrecise 4.0 database [5].

MocR regulators are a family of proteins belonging to the class of GntR regulators [7] characterized by the presence of two domains. The N-terminal domains, 60 residue-long on average, display the winged-helix–turn–helix architecture (wHTH) and are responsible for DNA recognition and binding [7]. The C-terminal domains [8] are quite large (350 residue on average) and are characterized by a tertiary structure belonging to fold type-I pyridoxal 5′-phosphate (PLP) dependent enzymes [9], of which aspartate aminotransferase (AAT) is the archetypal enzyme. The two domains are linked to each other by a peptide bridge [10, 11]. The three-dimensional structure of GabR [12] from Bacillus subtilis, one of the best characterized MocR [13,14,15,16], confirmed the presence of a C-terminal AAT-like domain and provided fundaments for further investigations. Only a few other MocRs have been experimentally characterized so far: for example, TauR, involved in the regulation of taurine utilization genes in Rhodobacter capsulatus [17]; PdxR, involved in the regulation of the PLP synthesis in several bacteria such as Corynebacterium glutamicum [18], Streptococcus pneumonia [19], Listeria monocytogenes [20], Bacillus clausii [21]; DdlR from Brevibacillus brevis which activates the expression of the gene coding for the enzyme d-alanyl-d-alanine ligase [22]. The entire MocR population can be subdivided into groups characterized by different structural and functional properties [6, 23] such as YczR.

Regarding the possible function of YczE proteins, it has been hypothesized that in Bacillus amyloliquefaciens, they are able to positively regulate the biosynthesis of bacillomycin D although in a yet-unidentified manner [1]. Other authors found that in B. subtilis B3, the genomic region involved in the biosynthesis of surfactin contained the yczEB3 gene coding for a YczE-like protein [24]. Interestingly, the same genomic region hosted the aspB3 gene coding for a putative aminotransferase-like protein. More recently, it has been proposed that YczE may anchor to the membrane the mega-synthases responsible for the biosynthesis of polyketides or lipopeptides in B. amyloliquefaciens FZB42 and subtilis [25]. However, in these two species, yczE is not divergently oriented with a yczR gene. Since, in many cases, yczE genes are presumed to be under the control of transcription factors able to bind PLP, it is reasonable to expect that YczE be involved in transport and/or processing of metabolites related, directly or indirectly, to PLP chemistry.

To put forward hypotheses amenable to experimental testing about the possible function of the YczE proteins, a phylogenetic profile strategy has been applied. The strategy consists in searching for those genes that, within a set of genomes, co-occur exclusively with a certain gene of interest. The co-occurring genes can be considered functionally linked to the target gene [26]. Application of specific in silico methods may also indicate the possible membrane protein type and generic function [27].

Although YczE is widespread among eubacteria [6], we focused our analysis on a set of mycobacterial species. Mycobacterium genus belongs to the actinobacteria phylum and comprehends many different species several of which are pathogenic to vertebrates. Mycobacterial species are often classified as slow- or fast-grower [28]. Slow-growth Mycobacteria such as M. tuberculosis or leprae are generally highly pathogenic. Moreover, Mycobacteria possess a complex cell envelope consisting of a cytoplasmic membrane and a cell wall which plays a crucial role in the intrinsic drug resistance and in survival under harsh conditions [29]. Mycobacteria represented a very suitable and attractive set of data for our in silico analyses, because: (a) the species are evolutionarily close; (b) YczE and YczR are present only in a subset of the species; (c) in these species, yczE and yczR are divergently transcribed. In general, Mycobacteria are interesting for their increasing relevance to the human health [30]. We collected a set of 30 mycobacterial genomes. Out of the 30 mycobacterial proteomes, only 16 contained YczE proteins. Two orthology clustering procedures were applied to find the proteins co-occurring exclusively with the YczE proteins. Results suggest that YczE may be involved in the membrane translocation and/or metabolism of sulfur-containing compounds such as taurine.

2 Materials and Methods

Mycobacterial complete proteomes have been retrieved from the RefSeq genome databank [31]. RegPrecise version 4.0 was the reference data bank for regulon prediction [5]. Synteny analysis utilized the SyntTax web server [32]. Clusters of orthologs were identified using the program Proteinortho V5.15 [33] and the stand-alone version of OMA [34].

Sequence searches were carried out using Blast, Rps-blast [35], and the HMMER suite [36] implementing the Hidden Markov Models (HMM) profile search. Multiple sequence alignments were obtained with Clustal Omega [37] and displayed with Jalview [38]. Halign Web server [39] was also tested as an alternative multiple sequence alignment procedure. Phylogenetic trees were calculated with the suite MEGA v.7.0 [40]. Fold recognition and structure prediction relied on the Phyre2 [41] and HHpred [42] servers. Phyre2 provides a fold recognition service based on profile alignments. The HHpred suite implements the HMM vs HMM profile comparison to assign a protein to a particular structural fold or, alternatively, to a protein family.

Secondary structure and trans-membrane helix prediction utilized MEMSAT of the PsiPred server [43], TMHMM [44], and CCTOP [45].

Perl, awk, and bash scripts were written to analyze data and parse output text files.

3 Results

3.1 Data Set

A set of 30 mycobacterial proteomes were retrieved from the RefSeq genome databank (Table 1). The sample was chosen, so that only 16 proteomes contained YczE and YczR proteins (Fig. 1) The corresponding genomes were scanned through the SyntTax Web server [32] using the YczE protein from Mycobacterium smegmatis (RefSeq code YP_886671) as a query, to confirm synteny in correspondence of the genes yczE and yczR. Moreover, scrutiny of the mycobacterial genome maps confirmed that the genes coding for YczE are always divergently transcribed with respect to the genes yczR coding for YczR [6] (Fig. 1). It should also be mentioned that M. smegmatis genome has got two yczE identical copies divergently transcribed to cognate identical yczR genes. Only one pair is reported in Fig. 1. Moreover, in the case of M. smegmatis, goodii and neoaurum short genes coding for non-conserved hypothetical proteins about 50–60 residue-long (WP_023985549, WP_053194597 and YP_886670, respectively) are predicted to occur between the yczR and the yczE genes (Fig. 1).

Table 1 Set of mycobacterial proteomes utilized in the work
Fig. 1
figure 1

Scheme showing the gene layout around the genes yczE and yczR (labelled in the scheme) in the mycobacterial genomes considered. Genes coding for homologous proteins are depicted with the same arrow style and their products are listed in the box. Arrow length and distances are not proportional to sequence length except for the “hypothetical” proteins

The presence (or absence) of the YczE proteins in each of the 30 mycobacterial genomes considered was further confirmed by scanning their proteomes with the Hmmsearch program, in the HMMER suite, and Rps-blast. In the first case, an HMM profile calculated with the aligned YczE sequences was employed to scrutinize the mycobacterial proteomes. In the latter, Rps-blast searched for the occurrence of the CDD query profile (code COG2364, PSSM id 225239) representing the YczE domain. It should be reported that HMM searches retrieved also the sequence WP_003885078 from Mycobacterium sp. VKM Ac-1817D originally not included in the YczE set. However, this protein is a distant paralog of WP_003881203 within the same genome, sharing only 29% sequence identity, and it is not divergently transcribed with respect to any MocR regulator.

3.2 MocR Regulators

A census of the MocR regulators occurring in the 30 selected mycobacterial proteomes was obtained with the use of Hmmsearch and Rps-blast. As before, an HMM profile was calculated from the aligned YczR sequences and thereafter used to scan the mycobacterial proteomes. Rps-blast searched the proteomes with the CDD profiles representing the HTH and the AAT domains, respectively (Table 2). A protein was considered a bona-fide MocR only if contained both domains. The number of MocR contained in a single proteome is variable (Table 2). Five proteomes, M. bovis, M. haemophilum, M. leprae, M. tuberculosis, and M. xenopi, apparently lack the regulators.

Table 2 MocR regulators found in each mycobacterial proteome considered

A cladogram has been generated from the multiple sequence alignments of all the MocRs found in the mycobacterial proteomes. The cladogram substantiates the notion that the regulators divergently transcribed with respect to the yczE genes constitute a subgroup [6] within the MocR family (Fig. 2). Indeed, all the YczRs segregate in the same subtree with the exception of the MocRs from the YczE-free species M. elephantis and thermoresistibile.

Fig. 2
figure 2

Unrooted consensus tree showing the similarity relationships among the MocRs sequences detected in the selected mycobacterial proteomes. The cladogram has been calculated with the UPGMA method. Pairwise distances between sequences were calculated in units of number of amino acid differences per sequence. The bootstrap tree was inferred from 1000 replicates. Red color of the branches and taxon names denote the YczR subpopulation. Asterisk denote the node of the YczR subtree supported by more than 90% bootstrap frequency

3.3 Mycobacterium YczE

Protein YczE is a conserved membrane protein annotated in RefSeq as a “uncharacterized 5×TM membrane BCR, YitT family”. However, while the TMHMM prediction suggests the presence of only 5 helices, MEMSAT and CCTOP predict six trans-membrane helices (Fig. 3). A multiple sequence alignment of the Mycobacterium proteins contained in the data set and the layout of the predicted trans-membrane helices are displayed in Fig. 3. The proteins do not share any similarity to other structurally characterized proteins. In fact, Phyre2 and HHpred searches were unable to associate YczEs to any known fold.

Fig. 3
figure 3

Multiple sequence alignment of the YczE proteins found in the selected proteomes obtained with Clustal Omega. Sequence codes refer to Table 1. Coloring code follows the ClustalX scheme. Lines labelled with CCTOP, TMHMM and MEMSAT report the results of the trans-membrane helix predictions. Red bars indicate the predicted trans-membrane helices, while grey and blue bars denote “cytoplasmic” and “extracellular” peptide segments, respectively

3.4 Phylogenetic Profile Analysis

To propose hypotheses about possible functions of the conserved YczE protein, a comparative protein phylogenetic profile approach has been applied [26]. The rationale of the strategy consists in looking for proteins uniquely associated to the presence of YczE in every proteome of the considered set. As an example, if proteins A and B are found in a subset of proteomes exclusively in co-occurrence of YczE and viceversa, a functional link between A, B, and YczE may be hypothesized [46].

To identify co-occurring proteins, two ortholog-clustering techniques have been chosen considering their computational efficiency, ease of use, and straightforward result interpretation: Proteinortho V5.15 [33] and stand-alone OMA [34]. Both methods exploit an all-versus-all sequence comparison to quantify the similarity between every pairs of protein sequences followed by a clustering procedure. Differences in the methods used for sequence comparison, for residue exchange scoring, and data clustering can yield partially different results on the same data set. For that reason, we applied a comparative strategy: results delivered by the two methods were compared and combined. The clusters containing homologs exclusively occurring in YczE-positive mycobacterial proteomes were considered functionally associated with yczE gene.

Results derived from the application of the two clustering methods are reported in Table 3. Clusters were denominated according to RefSeq annotation and InterPro assignment [47]. The definitions of the families to which the orthologous clusters are predicted to belong are: (a) S-adenosyl-l-methionine-dependent methyltransferase-like; (b) sulfurtransferase (rhodanese-like domain); (c) monooxygenase (luciferase-like domain); (d) 2-oxo-4-hydroxy-4-carboxy-5-ureidoimidazoline decarboxylase (OHCU); (e) taurine transporter permease TauC; (f) taurine import ATP-binding protein TauB; (g) ABC-type taurine transport system, periplasmic component, TauA; (h) MarR-like transcriptional regulator. As expected, the two phylogenetic methods were able to correctly cluster the YczE proteins of the 16 mycobacterial proteomes (data not shown).

Table 3 Cluster of orthologs co-occurring with YczE protein

Composition of each cluster was further validated using HMM profile searches. Sequences of each cluster were aligned with Clustal Omega and a HMM profile was calculated with Hmmbuild from the HMMER suite. Hmmsearch was subsequently applied to compare each query profile against the 30 proteomes considered to detect occurrence of possible orthologs in YczE-free mycobacterial species missed by the clustering programs.

Searches using the profile derived from “S-adenosyl-l-methionine-dependent methyltransferase-like” cluster showed that these proteins are found only in the YczE-containing proteomes. The only exception is a sequence annotated as type-11 methyltransferase in M. gordonae which, however, is assigned a very low E-value (10−6) compared to the average E-value shown by the cluster proteins (about 10−150).

The sulfurtransferase, “rhodanese–like domain” profile, captures the presence of homologous, distant domains also in the YczE-free mycobacterial proteomes. A careful inspection of such sequences shows that they are multi-domain proteins containing a C-terminal rhodanese-like domain and an N-terminal domain of different types. For example: the Ars transcriptional regulators (such as WP_023369982) that contain an N-terminal HTH domain, the molybdopterin biosynthesis-like protein MoeZ (WP_003874621) with an N-terminal domain part of the superfamily of E1-like enzymes, and the sulfurtransferases that possess an N-terminal domain belonging to the MBL-fold metallo-protease superfamily (example WP_003924161).

The search with the monooxygenase “luciferase-like domain” profile captured homologs in all the mycobacterial proteomes; however, those found in the YczE-free species are mostly annotated as “FMN-dependent oxidoreductase” or “urease subunit gamma”. The “luciferase-like” protein displays weak similarities only with the C-terminal portion of these proteins.

The profile corresponding to the putative “2-oxo-4-hydroxy-4-carboxy-5-ureidoimidazoline decarboxylase” detected distantly related homologs in M. marinum, and ulcerans, while none was present in the other YczE-free mycobacterial species. Interestingly, in most of YczE-containing mycobacterial species, a paralog of OHCU can be observed along with a protein annotated as “uracil–xanthine permease” (for example, YP_001703664 from M. abscessus). This protein contains an N-terminal domain belonging to the family of the xanthine permeases and a C-terminal domain homologous to OHCU.

The profile calculated with the proteins annotated as “taurine transporter permease TauC” collected several, more distant, homologs in all the other mycobacterial species except M. leprae. This result reflects the presence, within a species, of a variety of homologous membrane transporters responsible for the translocation of different classes of metabolites. However, the membrane proteins from the YczE-positive species constitute a distinct subpopulation which suggests that they may have different function and/or possess a different substrate specificity.

A similar pattern can be observed for the protein cluster annotated as “taurine import ATP-binding protein TauB”, namely the ATP-binding subunit of the ABC transporter. In this case, only OMA detected the cluster (Table 3). For that reason, a more careful analysis of the distribution of homologs among the mycobacterial species was carried out. Several distant homologs could be observed in all species. However, the TauB protein from the YczE-positive species constitutes a distinct subgroup within the set as shown by the cladogram reported in Supplementary material Fig. 1.

ABC importers have cognate periplasmic-binding proteins that capture the solute and feed it to the periplasmic side of the transporter [48, 49]. Both orthology programs were able to detect a cluster containing a periplasmic-binding protein, namely “ABC-type taurine transport system, periplasmic component TauA” (Table 3). The results of the HMM search revealed the presence of several homologs within the same proteomes, whereas the TauA-like proteins were missing in the M. bovis, haemophilum, leprae, marinum, parascrofulaceum, tuberculosis and ulcerans species. Sequence comparison confirms that the TauA-like periplasmic proteins represent a conserved subgroup within the considered proteomes. All these considerations substantiate the notion that a complete ABC system is specifically co-occurring with the gene yczE.

MarR (multiple antibiotic resistance) transcriptional regulators are a family of small proteins (about 140 residue-long) containing a winged-helix DNA-binding domain [50]. Proteins of the MarR family are involved in a variety of biological functions, such as resistance to multiple antibiotics, organic solvents, and oxidative stress agents. These proteins also regulate the synthesis of pathogenic factors in bacteria able to infect humans and plants [51]. The HMM profile of the 16 MarR orthologs found by the clustering programs retrieved distant homologs in all the considered mycobacterial species. Once more, the MarRs from the YczE-containing mycobacterial species represent a subfamily clearly denoted within the set of homologous regulators (Supplementary material Fig. 2). Interestingly, in all the YczE-positive species, MarR genes are adjacent to a Major Facilitator Superfamily (MFS) transporter gene [52]. On the contrary, MarR is often adjacent to a peptide ABC transporter ATP-binding protein in the other mycobacterial species.

3.5 Putative Structural and Functional Features of the Cluster Proteins

3.5.1 S-Adenosylmethionine-Dependent Methyltransferase-Like Proteins

Phyre2 fold recognition using one of the proteins of the S-adenosylmethionine-dependent methyltransferase-like cluster as a query (WP_005085951 from Mycobacterium abscessus) suggests that these proteins are structurally compatible with the N-terminal portion of several proteins annotated as methyltransferases. Among the most compatible are: tfu_2867 from Thermobifida fusca (PDB:2QE6) that shares with the query about 29% sequence identity over a 130 residue-long alignment; the S-adenosylmethionine-dependent methyltransferase from C. glutamicum (PDB:3CGG), 23% sequence identity; the trans-aconitate 2-methyltransferase from Agrobacterium tumefaciens (PDB:2P35), 20% identity; the methyltransferase from antibiotic biosynthesis pathway from Anabaena variabilis (PDB:3EGE), 19% identity. A corresponding multiple sequence alignment is reported in Supplementary material Fig. 3. Moreover, HHpred assigns the portion encompassed by the positions 55–173 of the multiple sequence alignment of the mycobacterial methyltransferase-like proteins, to several CDD profiles pertaining to different methylases such as COG2230, representing the “Cyclopropane fatty-acyl-phospholipid synthase and related methyltransferases” (Table 3).

3.5.2 Sulfurtransferase Rhodanese-Like Domains

The sulfurtransferase “rhodanese-like domains” are structurally affine to the rhodanese domains contained in several multi-domain proteins. For example, the sequence from M. smegmatis (WP_003883953) shares 36% sequence identity with the homologous domain of the Alicyclobacillus acidocaldarius protein that contains also a N-terminal β-lactamase domain (PDB:3TP9). Likewise, the sequence is structurally compatible to many other sulfurtransferase enzymes; among the most similar are the uncharacterized rhodanese-related protein from Thermoplasma volcanium (PDB:3GK5) that shares 26% sequence identity with the query over about 100 aligned residue, and the thiosulfate sulfurtransferase GLPE from E. coli (PDB:1GMX), 26% identity. A corresponding alignment is reported in Supplementary material Fig. 4. Accordingly, HHpred assigns the sequences to CDD profiles related to sulfurtransferases such as: CD01524 (pyridine nucleotide-disulphide oxidoreductase), or CD01534 representing “Rhodanese-related sulfurtransferase” (Table 3).

3.5.3 Monooxygenase (Luciferase-Like) Domains

Fold recognition for the “luciferase-like domains” detects structural compatibility with several monooxygenases. For example, the protein from M. smegmatis (WP_003882562) shares a significant structural compatibility with the C-terminal portion (sequence interval 225–406) of alkane monooxygenase from Geobacillus thermodenitrificans (PDB:3B9N) although with only 16% sequence identity. Likewise, HHpred indicates structural compatibility with portions of several monooxygenases. The most similar appear: nitrilotriacetate monooxygenase from Burkholderia pseudomallei (PDB:3SDO; sequence interval 226–424, 17% identity); alkane monooxygenase from G. thermodenitrificans (PDB:3B90; sequence interval: 234–405, 16% identity); alkanesulfonate monooxygenase from E. coli (PDB:1NQK; sequence interval 184–357, 12% identity). A sequence alignment is reported in Supplementary material Fig. 5. According to these findings, HHpred detects the presence of the CDD domains pertaining mainly FMN-dependent monooxygenases: for example, TIGR03612 corresponding to “pyrimidine utilization protein A” that is a FMN-dependent monooxygenase; TIGR03565 (alkanesulfonate monooxygenase, FMNH2-dependent); cd01095 (nitrilotriacetate monooxygenase); COG2141 (flavin-dependent oxidoreductase, luciferase family, including alkanesulfonate monooxygenase SsuD and methylene tetrahydromethanopterin reductase) (Table 3). Moreover, HHpred also identify the presence of a signature corresponding to the TIGR03854 profile corresponding to “probable F420-dependent oxidoreductase”. Interestingly, it has been found that coenzyme F420 is particularly abundant in mycobacterial species [53].

3.5.4 2-Oxo-4-Hydroxy-4-Carboxy-5-Ureidoimidazoline Decarboxylases (OHCU)

Phyre2 searches confirm the identity of the protein cluster denominated “2-oxo-4-hydroxy-4-carboxy-5-ureidoimidazoline decarboxylase”. The protein from M. smegmatis (WP_005064897) shares 33% sequence identity with PDB ID:3O7K, the 2-oxo-4-hydroxy-4-carboxy-5-ureidoimidazoline decarboxylase from Klebsiella pneumoniae.

3.5.5 Taurine Transporter Permease TauC

Proteins within this cluster are predicted to possess six trans-membrane helices. Blast searches through the UniProt/Swissprot databank retrieve several proteins of the Taurine permease transporter of type-I such as UniProt:Q47539 from E. coli (37% identity to the query sequence from Mycobacterium abscessus). Fold recognition confirms structural compatibility with several membrane transporters such as: molybdate/tungstate ABC transporter from Archaeoglobus fulgidus (PDB:2ONK), 14% sequence identity, and sulfate/molybdate ABC transporter from Methanosarcina acetivorans (PDB:3D31), 11% identity (Supplementary material Fig. 6). HHpred search of the CDD databank detects many domain signatures representing membrane transporters. Among the most significant are: COG0600 corresponding to “ABC-type nitrate/sulfonate/bicarbonate transport system, permease component”, TIGR01183 (“nitrate ABC transporter, permease protein”), and COG1174 (“ABC-type proline/glycine betaine transport system, permease component”) (Table 3).

3.5.6 Taurine Import ATP-Binding Protein TauB

Although this cluster is not detected by ProteinOrtho, its proteins should be considered as integral components of the Taurine-like ABC transporter in the YczE mycobacterial genomes. A Blast search in the Uniprot/Swissprot databank retrieves many ATP-binding components of ABC membrane transporters such as, for example, the “uncharacterized ABC transporter ATP-binding protein MJ0412” from Methanocaldococcus jannaschii (UniProt:Q57855) at 40% sequence identity and the “Taurine import ATP-binding protein TauB” from Paracoccus pantotrophus (UniProt:Q6RH47) at 43% identity. Fold recognition confirms that the cluster proteins are compatible with the structure of the ATP-binding components of membrane transporters, for example the “ABC transporter ATP-binding protein” from Thermotoga maritima (PDB:4YER) with which it shares 28% sequence identity, “CYSA, putative ABC transporter ATP-binding protein” from Alicyclobacillus acidocaldarius (PDB:1Z47), 42% identity, or the “maltose/maltodextrin transport ATP-binding protein” PDB:1Q1B from E. coli, 40% identity (Supplementary material Fig. 7). Likewise, HHpred CDD search assigns the cluster proteins to families of ATP-binding of ABC transporters, such as: COG1116 (“ABC-type nitrate/sulfonate/bicarbonate transport system, ATPase component”) or COG1126 (“ABC-type polar amino acid transport system, ATPase component”).

3.5.7 TauA Periplasmic-Binding Protein

Fold recognition procedures assigns this cluster to the structures of periplasmic proteins such as “alkanesulfonate binding protein2” from Xanthomonas axonopodis pv. citri (PDB:3E4R) with 25% sequence identity, and “periplasmic aliphatic sulphonates-binding protein” from E. coli (PDB:2X26), 18% identity (Supplementary material Fig. 8). Accordingly, HHpred assigns the binding protein to the family TauA (CDD code COG4521) denominated “ABC-type taurine transport system, periplasmic component”, or “ABC-type nitrate/sulfonate/bicarbonate transport system, periplasmic component” (COG0715) (Table 3).

3.5.8 MarR-Like Transcriptional Regulator

Fold recognition links the proteins of this cluster to structures of MarR family transcriptional regulators such as the regulator from Bacillus stearothermophilus (PDB: 2RDP) at about 17% sequence identity; from Salicibacter pomeroyi (3E6M) at 22% identity; from Streptomyces coelicolor (PDB:3ZPL), 24% identity; the regulator BldR from Sulfolobus solfataricus (PDB:3F3X) with 25% sequence identity. As expected, HHPred relates the ortholog cluster to the domain COG1846 corresponding to the MarR family (Supplementary material Fig. 9).

4 Discussion

In an attempt to delineate possible functions of the putative membrane YczE protein, phylogenetic profile analysis has been applied to a set of 30 mycobacterial species to look for co-occurring genes. Since phylogenetic profiling can be affected by serious artifacts mainly caused by the difficulty of discrimination between orthologs and paralogs [54], we applied two clustering programs the results of which have been combined by a consensus approach to minimize inaccuracies.

Our results suggest that YczE is consistently associated with the presence of at least eight other proteins (Table 3) in the YczE-positive mycobacterial species. Although uncharacterized, in silico analyses of these proteins have provided useful hypotheses about their potential role. The common trait of most of the co-occurring proteins is the apparent structural affinity with enzymes or transporters involved in the metabolism of sulfur-containing compounds. Indeed, among pathogenic bacteria only, Mycobacteria have been reported to produce sulfated metabolites [55]. To this respect, the most interesting result is the co-occurrence of YczE with an ABC importer (the trans-membrane, the ATP-binding components, and the periplasmic-binding proteins) that shares some similarity to other bacterial transport systems such as the E. coli TauABC complex involved in the uptake and subsequent processing of taurine in conditions of sulfate or cysteine starvation [56]. It is, therefore, conceivable that the homologous mycobacterial complex is functionally linked to membrane translocation of taurine and/or related sulfur compounds.

All these considerations point to a potential role of YczE in the context of a metabolic pathways involving transport and processing of sulfur-containing compounds, at least in the Mycobacterium genus. YczE is a membrane protein: it can be speculated that it interacts with the ABC transporter possibly triggering transport or coupling the transport to metabolite processing. Expression of yczE is under the putative control of YczR: therefore, it can be hypothesized that the sulfur compound uptaken by the membrane transporter or one of its metabolites may be the effector of YczR. The binding of this molecule to the regulator should be able to influence the expression of the yczE gene.

Distribution of YczE in the mycobacterial proteomes overlaps very well with the rate of growth: rapidly growing bacteria possess YczE, whereas the slow-growing ones, mostly highly pathogenic, do not. Only exception are Mycobacterium elephantis and thermoresistibile, which tolerate [57] higher temperature compared to other mycobacterial species. It is indeed well known that slow-growing mycobacterial species possess fewer membrane transporters [58, 59] that may limit uptake of external nutrients. This fact strengthens the notion that YczE may be involved in uptake processes able to contribute to sustain the activity of the metabolic machinery of fast-growing Mycobacteria. As a consequence, YczE cannot be considered per se a target for therapies for the most severe infections by the slow-growing pathogenic mycobacterial species but can be a potential target for the emerging opportunistic infections provoked by the widespread environmental Mycobacteria. Moreover, this comparative work contributes to the delineation of genomic and physiological differences between the pathogenic and non-pathogenic mycobacterial species.

In conclusion, this report proposes a putative functional role for the mycobacterial protein YczE and provides a conceptual framework for the design of rational experiments aimed at validating the theoretical observations. It may also contribute to a deeper understanding of the genomic differences between slow- and fast-growing mycobacterial species relevant to human health.