Introduction

Combustion of petroleum-driven high sulfur fuels releases vast amounts of sulfur dioxide into the atmosphere and is a major cause of environmental pollution and serious health problems (Feng et al. 2016; Mohebali and Ball 2008). According to legislation, the sulfur content in diesel oil should be < 10 ppm (Sadare et al. 2017). The hydrodesulfurization (HDS) process is used in oil refineries to remove sulfur from non-volatile organic sulfur compounds in petroleum (Soleimani et al. 2007). However, it demands specific operating procedures and also ineffective to remove sulfur from recalcitrant thiophenic compounds like dibenzothiophene (DBT) and its homologous alkylated derivatives (Mohebali and Ball 2008). The potential complementary and/or alternative to HDS is biodesulfurization where microorganisms are used which remove sulfur residues from the hydrocarbon chains in petroleum by cleaving C-S bond in the organic sulfur compounds through a specific metabolic pathway, called the “4S” pathway (Bhanjadeo et al. 2018; Boniek et al. 2015). During this sulfur-specific 4S pathway, the DBT is converted to sulfur-free compound, i.e. 2-hydroxybiphenyl (2-HBP) by an operon comprised of three genes (dszA, dszB, dszC) and a chromosome-born dszD gene (Aggarwal et al. 2013). In the majority of desulfurizing bacterial species (like Rhodococcus) the dszABC operon is located on plasmid, however; the dsz operon in some Gordonia species is also located on bacterial chromosome (Shavandi et al. 2010; Aggarwal et al. 2013).

In the current scenario, the biodesulfurization is a recommended approach to remove sulfur from complex compounds like dibenzothiophene (DBT), benzothiophene (BT), or their alkylated forms (Cx-DBTS, Cx-BTS) under mild conditions (Mohebali and Ball 2008; Akhtar et al. 2018). However, due to limited knowledge on desulfurizing traits of the bacterial species, the large-scale process has not been developed (Akhtar et al. 2019; Parveen et al. 2020). Genomics is one of the fastest evolving discipline of science and the next generation sequencing (NGS) has made it possible to have whole-genome sequences of various organisms not only in limited time but with minimal cost as well (Land et al. 2015). To date (NCBI January 2021), more than 0.3 million bacterial genomes have been sequenced (https://www.ncbi.nlm.nih.gov/genome/browse/) and this genomic information can provide an insight into the bacterial gene pool. Moreover, genomic analysis can expand the scale of microbial screening processes without labor-intensive growth experiments.

Previously, no study comprehensively describes the genome-based characterization, ortholog gene clusters identification and comparison of the sulfur metabolism-related genes of a Gordonia species desulfurizing thiophenic compounds and diesel oil. The current study provides an insight into the metabolic potential of the isolate W3S5 to use various alkylated derivatives of DBT, BT and thiophene as a sulfur source as well as to biodesulfurize the hydrodesulfurized diesel oil. Moreover, comparative genomic analysis of the obtained draft genome sequence has enabled us to clearly establish the taxonomic position of the bacterium, to explore its orthologous gene clusters as well as the genomic background associated with sulfur metabolism.

Materials and methods

Bacterial culture and the medium used

The Gordonia sp. W3S5 used in this study was isolated from oil-contaminated soil samples collected from a local oil drilling company (OGDCL-Rajian Oil Field, Chakwal, Pakistan). A chemically defined medium (MG medium) used for growth and other experimental schemes contained 2.0 g KH2PO4, 4.0 g K2HPO4, 1.0 g NH4Cl, 0.2 g MgCl2·6H2O, 5.0 g glucose, 10.0 ml metal solution, and 1.0 ml vitamin mixture in 1000 ml distilled water (pH 7.0) (Akhtar et al. 2018). The enriched culture was evaluated for its desulfurization activity by Gibbs test as described by Akhtar et al. 2009.

Biodesulfurization of thiophenic compounds and diesel oil

The potential of the isolated bacterium to use different thiophenic sulfur-containing compounds as a sole source of sulfur was studied. Inoculum was prepared by growing the bacterial cells in LB media for 48 h. Then 250 mL flasks containing 50 mL MG media were provided with 0.2 mM concentration of different thiophenic compounds and were inoculated (1% v/v, OD660 ~ 0.45 nm) in duplicates. These flasks were incubated at 30 ºC in a rotary shaker with 180 rpm. The bacterial growth (OD 660 nm) was recorded every 24 h for 10 days and the production of phenolic end products was estimated by Gibbs test (Akhtar et al. 2019). Diesel oil containing 245 ppm sulfur was obtained from a local oil refinery (Attock Refinery Limited, Rawalpindi, Pakistan) and the desulfurization studies were carried out in 1L capacity flasks (5% v/v oil pulp density) using the DBT-adapted bacterial cells prepared in MG-media at the optimized growth conditions (pH 7.0, 30 ºC, 180 rpm). The sulfur was estimated on the EDX-RF sulfur analyzer (Tanaka Scientific Limited, Japan) based on the American Society for Testing and Materials (ASTM) method D5453.

Genome sequencing and assembly

The genomic DNA extraction and genome sequencing were carried out commercially on Illumina (NovaSeq) platform (2 × 100 bp paired-end) by Macrogen Inc. Korea. De novo assembly of the raw reads was performed with the SPAdes assembler (version 3.5.0), with the following assembly parameters: k automatic selection based on read length, mismatch careful mode turned ON, repeat resolution enabled, mismatch corrector not skipped, coverage cutoff turned OFF. The whole-genome SRA sequence data and bio-project/bio-sample information of the Gordonia sp. reported in this study are available at DDBJ/ENA/GenBank.

Genome and taxonomic analysis

The gene prediction and annotation were done by Rapid Annotation using Subsystem Technology (ClassicRAST) server version 2.0 (default settings) (Aziz et al. 2008) and NCBI Prokaryotic Genome Annotation Pipeline (PGAP). Moreover, BlastKOALA and KEGG were used to annotate the genome sequence. The default settings parameters in BlastKOALA were set as-Taxonomy group: Prokaryotes, Bacteria; KEGG database to be searched: genus_prokaryotes.pep. For taxonomic studies, the closely related Gordonia species’ 16S rRNA gene sequences were taken from EzBioCloud and the National Center for Biotechnology Information databases (Yoon et al. 2017). The phylogenetic tree was constructed (tree topologies supported by bootstrap analysis) using the software package MEGA 6 (Tamura et al. 2013). The Genome-to-Genome Distance Calculator (GGDC) version 2.1 (http://ggdc.dsmz.de) (Meier-Kolthoff et al. 2013) was used to calculate the genomic digital DNA-DNA Hybridization (dDDH) values while the Kostas Lab webserver (http://enve-omics.ce.gatech.edu/ani/) (Goris et al. 2007), was used to compute the Average Nucleotide Identity (ANI) values between the closely related species.

Comparative genomic analysis

For comparative genomic analysis, the genome sequences of the closely related strains of Gordonia/Rhodococcus were obtained from the NCBI GenBank database and were uploaded in the RAST server. The accession numbers (in parentheses) of the selected genome sequences were as follows: Gordonia rubripertincta NBRC 101908 T (NZ_BAHB00000000.1), Gordonia rubripertincta CWB2 (NZ_CP022580.1), Gordonia rubripertincta SD5 (NZ_CP059694.1), Gordonia namibiensis NBRC 108229 T (NZ_BAHE00000000.1), Gordonia alkanivorans NBRC 16433 T (NZ_BACI00000000.1), Gordonia terrae NBRC 100016 T (NZ_BAFD00000000.1), Gordonia terrae C-6 (NZ_AQPW00000000.1), Gordonia amicalis NBRC 100051 T (NZ_BANS01000001.1), Gordonia amicalis BDS-1 (NZ_JACFXQ010000094.1), Gordonia desulfuricans NBRC 100010 T (NZ_BCNF00000000.1), Gordonia desulfuricans 213E (NZ_JAADZU010000001.1), Gordonia hydrophobica NBRC 16057 T (NZ_BCWU00000000.1), Rhodococcus sp. Eu-32 (QGNK00000000.1), Rhodococcus qingshengii IGTS8 (CP029297.1) and Rhodococcus erythropolis XP (NZ_AGCF00000000.1).

The orthologous gene cluster analysis of the W3S5 isolate with other type strains of Genus Gordonia was performed by uploading the protein sequence FASTA files in the OrthoVenn2 web platform (https://orthovenn2.bioinfotoolkits.net/home), keeping the threshold e-value 1e-5 and inflation as 1.5 (Xu et al. 2019; Datta et al 2020).

Results and discussion

Thiophenic compounds and diesel oil desulfurization potential of the isolate W3S5

The thiophenic compounds desulfurization potential of the isolate W3S5, as determined in MG media supplemented with different thiophenic sulfur-containing compounds is presented in Table 1. The growth (OD660nm) showed that the isolate W3S5 not only used DBT, BT and Thiophene but also their alkylated derivatives as a sulfur source (Supplementary Fig. 1). The growth was higher in the presence of DBT, 4-methyl DBT, 2,8-dimethyl DBT and 3-methyl BT as compared to the other compounds (Table 1). The high cell growth indicated that these organic sulfur-containing compounds have been metabolized/transformed by the W3S5 to fulfill its cellular sulfur need, suggesting its ability to decrease the sulfur contents of the fossil fuels. The culture broths of isolate W3S5, supplemented with DBT, DBT sulfone, 4-methyl DBT, BT and 3-methyl BT were also positive for Gibbs test (Table 1) indicating that the W3S5 has desulfurized and converted these compounds into sulfur-free alkylated/non-alkylated phenolic end products. The negative Gibbs test for 2,8-dimethyl DBT showed that either the isolate W3S5 is converting it into some non-phenolic end product or the produced phenolic end product is not free to react with the Gibbs reagent. The quantification of the produced phenolic end product by Gibbs test showed that the isolate W3S5 had completely desulfurized the DBT into 2-HBP (1.99 mM) (Table 1). However, the other DBT and BT derivatives except the 2,8-dimethyl DBT were 50% desulfurized in the stationary phase.

Table 1 Substrate spectra of strain W3S5 for various thiophenic compounds

A considerable number of bacteria that are able to desulfurize the symmetric heterocyclic sulfur-containing compounds like DBT appear to be impotent of desulfurizing the asymmetric heterocyclic sulfur-containing compounds like BT (Mohamed et al. 2015; Akhtar et al. 2018). The bacterial isolate W3S5 described in this study can be categorized as symmetric and asymmetric sulfur compound (BT and DBT) desulfurizing bacterium. Such bacteria are thought to be of incalculable value biocatalysts for process development to desulfurize a broad range of thiophenic sulfur-containing compounds present as organic sulfur in petroleum oil (Mohamed et al. 2015).

After biodesulfurization by the isolate W3S5, the total sulfur content of the diesel oil was 147 ppm which is related to a total reduction of ~ 40% in 30 days as compared to the control, demonstrating it as a potent desulfurizing candidate for the up-scaled studies for removal of sulfur from petroleum oils. These findings validate the ability of the isolate W3S5 to desulfurize the thiophenic sulfur-containing compounds and diesel oil through enzymes/metabolites produced by putative genes that need to be studied in its genome.

Genome assembly and annotation

The total number of raw reads produced was 26,583,530 with Q20 and Q30 values as 97.2% and 92.14%, respectively. The high Q scores value indicated a smaller probability of error in the genomic data of the W3S5. The de novo genome assembly was comprised of 49 contigs with a genome coverage of 241X. To support the authenticity of the genome assembly (assembled using SPAdes version 3.5.0), the 16S gene sequence was used, as described by (Chun et al. 2018). The genome assembly was classified as undecided for the potential contaminations as determined by ContEst16S (Contamination Estimator by 16S) algorithm. In ClassicRAST analysis, the draft genome sequences of the isolate W3S5 comprised of total length (4,857,317 bp); contig count (49); N50 (430,858 bp), L50 (4) and G + C content (67.50%). Moreover, a total of 4,435 protein-coding sequences and 51 RNAs (48 tRNA genes and 3 copies of rRNA) were found in the genome (Table 2). Further ClassicRAST based functional gene subsystem clustering analysis revealed that 404 subsystems are present in the genomic island of the W3S5 (Table 2). The subsystems representing the amino acids and derivatives (415 ORFs); cofactors, vitamins, prosthetic groups, pigments (334 ORFs); carbohydrate metabolism (364 ORFs); protein metabolism (251 ORFs); and fatty acids, lipids, and isoprenoids (235 ORFs) were in large number. Furthermore, the subsystems connected with membrane transport (70 ORFs); stress response (120 ORFs); sulfur metabolism (60 ORFs); as well as aromatic compounds metabolism (45 ORFs) were also identified (Fig. 1). Of particular interest to organic sulfur metabolism, subsystems associated with alkanesulfonate assimilation/utilization were indicated in the genome of the isolate W3S5.

Table 2 Genome features of strain W3S5
Fig. 1
figure 1

Subsystem category distribution from strain W3S5, generated through ClassicRAST pipeline (default settings)

Phylogenetic and genome-based classification at the species level

For taxonomic evaluation of the bacterium, a combination of 16S similarity and OGRI (overall genome-related index, i.e. ANI and dDDH) were used systematically. As a first step, to find the closely related strains, the 16S rRNA gene sequence of the isolate W3S5 was submitted to NCBI and EzBioCloud databases (Chun et al. 2018; Yoon et al. 2017). The BLASTn and EzBioCloud search showed that the 16S nucleotide sequence of the isolate W3S5 was highly similar to different species of the genus Gordonia (rubripertincta, namibiensis, alkanivorans, westfalica, amicalis, desulfuricans, nitida, hongkongensis and terrae). On the basis of 16S rRNA gene sequence, maximum similarity (100%) of the W3S5 was found with Gordonia rubripertincta NBRC 101908 T [BAHB01000127], followed by Gordonia namibiensis NBRC 108229 T [BAHE01000050], 99.44%, and Gordonia westfalica DSM 44215 T [FNLM01000036], 99.17% (Table 3). Likewise, phylogenetic analysis of the closely related 16S rRNA gene sequences showed that the W3S5 formed a separate branch with the Gordonia rubripertincta NBRC 101908 T (Fig. 2). Further, to establish more specific taxonomic position at the species level, a comparison of the genome of the W3S5 was carried out to its closely related type strains using ANI and dDDH values (Table 3). According to the current bacterial taxonomy, the projected and generally accepted dDDH and ANI values are 70% and 95–96%, respectively, between genomes of the same species (Chun et al. 2018). A comparison of the W3S5 and Gordonia rubripertincta NBRC 101908 T revealed dDDH value above 80% and the ANI value above 98% supporting the W3S5 and Gordonia rubripertincta NBRC 101908 T being the same species (Table 3). Taking together the 16S rRNA gene, dDDH and ANI the isolate W3S5 described here is closely related to Gordonia rubripertincta.

Table 3 The 16S rRNA gene similarity, Average Nucleotide Identity (ANI) and digital DNA-DNA Hybridization (dDDH) values by the genome comparison of W3S5 to its more closely related type strains (16S rRNA gene similarity ≥ 98.7%)
Fig. 2
figure 2

Neighbor-joining phylogenetic tree based on 16S rRNA gene sequences. The bootstrap consensus tree was inferred from 100 replicates. The evolutionary distances were computed using the Tamura-Nei method and are in the units of the number of base substitutions per site. Evolutionary analyses were conducted in MEGA 6

Potential genes and regulatory sequences providing desulfurization trait in isolate W3S5

Sulfur is required by microbial cells for growth and various biochemical processes. Bacterial cell dry mass is composed of 0.5–1% sulfur (as part of amino acids, proteins and enzyme cofactors) (Feng et al. 2016). Microorganisms acquire sulfur from different sources. Some are proficient in extracting sulfur from organosulfur compounds such as DBT and its derivatives while others can acquire from inorganic sources (Soleimani et al. 2007).

A large number of Gordonia species are being used in various industrial and environmental biotechnology settings. One of the important features of this genus is the capabilities of its members to transform/biodesulfurize different organic sulfur-containing compounds (Drzyzga 2012). To identify the potential genes involved in sulfur metabolism, the genome of the isolate W3S5 was annotated. The ClassicRAST based genome annotation revealed that a total of 60 genes are involved in the metabolism of sulfur (Fig. 1), of which 23 genes are associated with organic sulfur assimilation/utilization. Furthermore, the organic sulfur metabolism in the W3S5 genome consists of ssu and dszABC operons (Table 4), responsible for the desulfonation of organic sulfur-containing compounds. In ssu operon, mainly the ssuD (encoding FMNH2 dependent monooxygenase) and ssuE (encoding NADPH dependent FMN reductase) genes are involved in the desulfonation of alkanesulfonates via carbon–sulfur bond cleavage (Aggarwal et al. 2013). Both SsuD and SsuE enzymes were found in the genome of isolate W3S5 (Table 4).

Table 4 Comparison of the key sulfur metabolism encoding genes present in W3S5 and other related species genomes

To find the genes associated with dszABC operon, the genomic analysis indicated that the isolate W3S5 harbored a 1251 bp gene fragment, located on contig#22 with the designated function of Acyl-CoA dehydrogenase probably dibenzothiophene desulfurization enzyme (Fig. 3). This gene was taken from the genome sequence of the W3S5 and was further analyzed by NCBI blastn tool which showed 100% similarity with the dszC gene of different Gordonia species. Adjacent to the dszC gene, an 852 bp gene fragment, also identified on contig#22, with the assigned function of dibenzothiophene desulfurization enzyme B was revealed in the genome of the strain W3S5 (Fig. 3). Furthermore, a 1425 bp gene fragment, also found on contig#22, showed high similarity to nitrilotriacetate monooxygenase (FMNH2) dibenzothiophene desulfurization enzyme A. The dszABC genes were also verified in the genome of isolate W3S5 by performing PCR amplification (Supplementary Figs. 2, 3 and 4), cloning and sequencing. The blastn analysis of the PCR amplified dsz gene sequences showed 100% sequence identity to dsz gene sequences of different species of the Gordonia. Moreover, in RAST sequence alignment, the sequences of the PCR amplified dszABC genes were aligning with the dszABC gene sequences present on the contig# 22 in the genome of W3S5, authenticating the genome-based detection of the sulfur metabolism genes. The dsz genes belong to the alkanesulfonate assimilation family, which regulates dszABC operon by gene activation in the presence of DBT (Shavandi et al. 2010). The sequence similarity analysis of a 200 bp nucleotide sequence upstream to the starting codon ATG of the dszA gene (on contig#22) using the NCBI blastn tools showed 100% similarity to the Gordonia strains RIPI90A and CYKS1 promotor element sequences. Moreover, ClassicRAST based further analysis showed transposases like sequences in the upstream region of the promoter element (Fig. 3). A similar type of transposase sequences has been reported near the dsz operon of several other desulfurizing bacterial strains including Gordonia alkanivorans strain 1B (GenBank accession number AY678116), Gordonia sp. CYKS2 (GenBank accession number AY396519), and Rhodococcus sp. (GenBank accession number U08850). It has been reported that the transposases in the proximity of dsz operon might be responsible for horizontal transfer of the desulfurization operon/genes between taxonomically dissimilar bacterial strains and its transfer from plasmid to the chromosome or vice versa (Shavandi et al. 2010).

Fig. 3
figure 3

Genetic structure of W3S5 dszABC operon (on contig#22) showing associated genes upstream and downstream of the operon. The function assignment of the ORFs was done by ClassicRAST and NCBI blastn analysis

In addition to DBT desulfurization enzymes, alkanesulfonates transport system permease proteins, alkanesulfonates ABC transporters, ATP-binding proteins and sulfonate monooxygenases encoding sequences, which are probably involved in the uptake of alkanesulfonates inside the cell, were also identified in the genome of W3S5 (Table 4). The annotation performed using KEGG in BlastKOALA (Kanehisa et al. 2016) revealed that other than the sulfur metabolism, the genome of W3S5 has a wide range of enzymes, including trehalose (17), hydratases (33) oxygenases (69), cytochromes (17), reductases (104), proteases (15) and transaminases (6), supporting it as a potential candidate for applications in industrially important bioprocesses like biodesulfurization.

Comparative genomics studies for organic sulfur metabolism

To develop our understanding of similarities and divergences of the organic sulfur metabolism genes present in the W3S5, its genome sequence was compared with closely related type strains as well as desulfurizing strains of Gordonia species by uploading the genome sequences in RAST. Members of the genus Rhodococcus have the ability to desulfurize the thiophenic sulfur-containing compounds present in fossil fuels. Therefore, some reported biodesulfurizing Rhodococcus species, whose genome sequences were available in the NCBI database, were also included in the comparative analysis (Table 4). The alkanesulfonate utilization/assimilation genes associated with organic sulfur metabolism were detected in all Gordonia and Rhodococcus species. The ssuEADCB gene cluster is required for the utilization of alkanesulfonates as sulfur sources. The RAST comparative genomic analysis showed that all the genes involved in this gene cluster are present in the W3S5 except the ssuA gene, encoding for alkanesulfonates binding proteins. The operon dszABC involved in the biodesulfurization of DBT, a major constituent of the thiophenic compounds present in fossil fuels was not detected in some of the species. A complete dszABC operon was only present in Gordonia rubripertincta W3S5, Gordonia terrae NBRC 100016 T, Rhodococcus sp. IGTS8 (now Rhodococcus qingshengii IGTS8) and Rhodococcus erythropolis XP.

The ABC-type nitrate/sulfonate/bicarbonate transport systems, periplasmic components (PC), necessary for the initial step of sulfur oxidation pathway (Urich et al. 2006) were present in all the annotated genomes except Gordonia rubripertincta NBRC 101908 T, Gordonia hydrophobica NBRC 16057 T, Rhodococcus qingshengii IGTS8 and Rhodococcus erythropolis XP (Table 4). The sulfonate monooxygenase (SO), Alpha-ketoglutarate-dependent taurine dioxygenase (TD), ABC-type nitrate/sulfonate/bicarbonate transport system, permease and ATPase components (Prc & Ac), associated with the alkanesulfonate utilization/assimilation subsystem, were present in the genome of W3S5 (Table 4). However, genes for the enzymes SO, Prc, Ac and TD were either completely absent (Gordonia hydrophobica NBRC 16057 T) or partially absent (Gordonia alkanivorans NBRC 16433 T, Gordonia amicalis NBRC 100051 T, Gordonia amicalis BDS-1) in some of the bacterial species genome sequences (Table 4).

The genome-wide analysis of orthologous gene clusters can enable us to explain the evolution and correlation of proteins across multiple species (Singh et al. 2020). The genome comparative studies of six species of Gordonia, including the W3S5 using OrthoVenn2 tool, generating a Venn diagram from user-defined cluster files (Xu et al. 2019; Datta et al 2020) revealed that the six species make a total of 4869 clusters, of which 2257 are orthologous clusters (contains at least two species) and 2612 are single-copy gene clusters. The Venn diagram and the bar plot (Fig. 4a, c) showed that the numbers of core ortholog clusters shared by all the six species were 2685 that suggests their conservation in the lineage after speciation events. The cumulative number of ortholog clusters shared between any two genomes, including the W3S5, was 632. A total of 112 gene clusters were unique to only a single genome. These clusters are probably gene clusters within multiple genes or in-paralog clusters which suggest that a lineage-specific gene expansion has occurred in these gene families. The Venn diagram showed that the W3S5 shared the highest (88) clusters with Gordonia rubripertincta NBRC 101908 T which also supported the ANI and dDDH based taxonomic evaluation of W3S5 as shown in Table 3. Additionally, the bar plot below the Venn diagram showed that the number of ortholog clusters found in the W3S5, Gordonia terrae NBRC 100016 T, Gordonia rubripertincta NBRC 101908 T, Gordonia desulfuricans NBRC 100010 T, Gordonia amicalis NBRC 100051 T and Gordonia alkanivorans NBRC 16433 T were 4061, 4065, 4201, 3503, 3842 and 4056, respectively (Fig. 4b). The complete ortholog gene cluster analysis of protein sequences of all the six Gordonia strains revealed the correlation of orthologous genes among these strains.

Fig. 4
figure 4

Venn diagram and the bar plots generated by Orthovenn2 represents the distribution of shared and unique gene clusters among different biodesulfurizing Gordonia species. The specie names used in the analysis are; GAK33: Gordonia alkanivorans NBRC 16433 T, GAM51: Gordonia amicalis NBRC 100051 T, GDE10: Gordonia desulfuricans NBRC 100010 T, GRU08: Gordonia rubripertincta NBRC 101908 T, GTR16: Gordonia terrae NBRC 100016 T, GWS35: W3S5. a The Venn diagram represents the distribution of core ortholog clusters, shared clusters and unique clusters in all the six species b The bar plot represents the cumulative ortholog clusters found in each species. c The bar plot illustrates the cumulative core, shared and unique clusters in all the six species, where label 1 on the horizontal scale shows the cumulative number of unique clusters (112) for all the six species, while label 2 shows the total number of clusters shared by two species (632) and so on

Conclusions

The genome-based taxonomic studies indicate that the isolate W3S5 belongs to Gordonia rubripertincta. It is anticipated that the identified sulfur metabolizing genes and regulatory sequences in the Gordonia rubripertincta W3S5 provide ecologically critical traits for cellular survival in sulfur-rich environments. The comparative genomic analysis showed that dsz operon encoding thiophenic sulfur desulfurization enzymes was present in Gordonia and Rhodococcus species, suggesting that it might be transferable between species of the different genera. Moreover, the orthologous gene clusters comparison among the W3S5 and Gordonia species revealed a conservation of developmental-related core genes and the presence of many unique genes, suggesting a lineage-specific gene expansion in these gene families. In addition, the results indicated that the Gordonia rubripertincta W3S5 could desulfurize different types of alkylated/non-alkylated organosulfur compounds, thus it might be a useful biocatalyst for its application in the various biodesulfurization processes.

Accession Numbers

The whole-genome SRA sequence data of W3S5 is available in the NCBI database under the SRA accession PRJNA555169. Moreover, the whole-genome shotgun project of Gordonia sp. W3S5 reported in this study is available at DDBJ/ENA/GenBank under the accession NZ_VLNS00000000 (BioProject: PRJNA555169; BioSample: SAMN12302752), version VLNS00000000.1. The W3S5 partial 16S rRNA gene sequence is available at GenBank under accession number MH569672.