1 Introduction

Global industrialization has brought the world into the modern age but has resulted in the release of many pollutants into the environment (Brimblecombe, 2005; Carpenter et al., 1998; Pimentel, 2005; Power et al., 2018; Thevenon et al., 2011; Tilman, 1998). Soil is particularly at risk from contamination either by direct application of agrochemicals (e.g., pesticides) (Pimentel, 2005; Tang et al., 2021), accident chemical release via chemical spills (Shin et al., 2018; FAO and UNEP, 2021), or through atmospheric deposition of pollutants (Cai et al., 2022; Swain et al., 1992). Generally, these contaminants, which can include herbicides, fungicides, insecticides, polycyclic aromatic hydrocarbons, and heavy metals, can have long-term adverse effects on agricultural soil, resulting in decreased soil productivity (Alengebawy et al., 2021; Kaur et al., 2017; Srivastava et al., 2017) as well as potential contamination of the food product itself (Ahmad et al., 2021; Alengebawy et al., 2021; Haddad et al., 2023; Zhang et al., 2017). Pesticides (e.g., chlorpyrifos, malathion, glyphosate) are commonly used in agricultural practices, and their benefit to the industry has been massive; however, these compounds can have adverse effects on the environment and the soil they are applied to (Fox et al., 2007; Gunstone et al., 2021; Kaur et al., 2017; Lu et al., 2020; Pimentel, 2005; Sharma et al., 2019; Tang et al., 2021; Woodcock et al., 2016). Further, pesticide poisoning in humans is a significant concern, resulting in 385 million unintentional poisonings globally, with approximately 11,000 deaths yearly (Boedeker et al., 2020).

Polycyclic aromatic carbons (PAHs) such as benzene, anthracene, and benzo[α]pyrene are pollutants of concern for public health. PAHs can be carcinogenic (Rengarajan et al., 2015) and can be absorbed into plant material, contaminating the food chain (Fan et al., 2020; Zhang et al., 2017). Soil can also be contaminated by long-range transport of PAHs through the atmosphere, which are then deposited into topsoil (Arellano et al., 2018; Gocht et al., 2007; Nam et al., 2008). Due to the high hydrophobicity of PAH compounds, they can bind to soil particles and have half-lives between days and months (Duan et al., 2015; Luo et al., 2008; Roslund et al., 2018; Yang et al., 2010).

Heavy metal contamination of soil is also a concern and can result from natural geochemical processes, direct application in the agricultural industry, or pollution (Alengebawy et al., 2021). Heavy metals such as arsenic, cadmium, chromium, lead, and mercury are toxic to humans, plants, and wildlife and can be long-lived in soil. These contaminants also threaten the food chain since plants can absorb these elements into food products, which can accumulate in humans (Haddad et al., 2023; Xiang et al., 2021; Zhang et al., 2017). Given these threats to the soil and agricultural production, methods to remove or neutralize pesticides, PAHs, and heavy metals are necessary. Bioremediation of contaminated soils by microbes is a cost-effective and efficient bioremediation method actively pursued by the field at large (Bala et al., 2022; Zhang et al., 2020).

Soil microbes, particularly rhizospheric microbes that interact closely with plant roots, are crucial to soil and plant health and productivity (Bulgarelli et al., 2013; Li et al., 2021; Mendes et al., 2011; Panke-Buisse et al., 2015). Microbes have evolved mechanisms to utilize plant aromatic compounds for tricarboxylic acid cycle intermediates. Through this evolution, these microbes can also degrade human-related aromatic compounds (e.g., pesticides and aromatic hydrocarbons) (Fuenmayor et al., 1998; Harwood & Parales, 1996; Jencova et al., 2004; Lessner et al., 2002; Mohapatra & Phale, 2021; Swetha & Phale, 2005). These same microbes also have mechanisms for resistance to heavy metals (Mathivanan et al., 2021; Pal et al., 2022). Given these characteristics, soil microbes are being pursued for deployment in bioremediation scenarios to degrade contaminants to restore healthy environments (Bala et al., 2022; Narayanan et al., 2023). There is also great potential for discovery and innovation in biotechnology to exploit the mechanisms of how these microbes degrade xenobiotic compounds and resist heavy metal exposure. By isolating and refining these systems, there could be significant benefits to agricultural production, human, and environmental disaster cleanup.

The Iyer Lab hosts an extensive library of environmentally derived bacterial isolates. Previous work investigating a particular cohort of bacterial isolates has determined that they can degrade organophosphate compounds individually (Iyer & Iken, 2013; Iyer et al., 2016, 2018) or as consortia (Islam & Iyer, 2021). This work furthers the previous studies by investigating the genomes of these isolates (Achromobacter xylosoxidans ADAF13 (Iyer & Damania, 2016c, d), Exiguobacterium sp. KKBO11 (Iyer & Damania, 2016a, b), Ochrobactrum anthropi FRAF13 (Iyer & Damania, 2016a, b), Pseudomonas putida CBF10-2 (Iyer & Damania, 2016c, d), Pseudomonas stutzeri ODKF13 (Iyer & Damania, 2016e), Rhizobium radiobacter GHKF11 (Iyer & Damania, 2016f), and Stenotrophomonas maltophilia CBF10-1 (Iyer & Damania, 2016g) to assess their potential degradation abilities for aromatics and xenobiotics as well as resistance to heavy metals (arsenic, cadmium, chromium, mercury, and lead) to determine the potential of these isolates in broad spectrum bioremediation.

2 Results and Discussion

2.1 Aromatic Compound and Xenobiotic Degradation

Plants often produce aromatic compounds exuded into the soil via roots and are a substantial carbon source for soil microbes capable of degrading them (Chaparro et al., 2013; Zhalnina et al., 2018). Consequently, soil microbes are often investigated for potential use in the bioremediation of PAH, pesticides, and even chemical weapons (Alves et al., 2018; Islam & Iyer, 2021; Iyer et al., 2018; Narayanan et al., 2023). To investigate the potential aromatic and xenobiotic compound degradation capacity of the isolates A. xylosoxidans ADAF13, Exiguobacterium sp. KKBO11, O. anthropi FRAF13, P. putida CBF10-2, P. stutzeri ODKF13, R. radiobacter GHKF11, and S. maltophilia CBF10-1, their genomes were annotated by the bacterial and viral bioinformatics resource center (BV-BRC)’s annotation service using RASTtk (Brettin et al., 2015; Olson et al., 2023). KEGG annotation and pathway mapping of the predicted proteins by RASTtk was used for aromatic and xenobiotic degradation pathway analysis (Kanehisa & Sato, 2020).

We used KEGG map01220 as a basis for a general overview of aromatic degradation pathways. An abridged map01220 showing only the pathways present among these isolates is shown in Fig. 1, where each isolate’s KEGG reconstruct results are indicated. The number of map01220 KEGG orthologs found in each of the isolates was enumerated in Online Resource 1. On average, there were 19 genes classified by KEGG in map01220 though these ranged widely from 45 genes in Pseudomonas putida CBF10-2 to only 5 genes in Exiguobacterium sp. KBBO11. Pseudomonas putida CBF10-2’s 45 genes were found in 20 different pathways/reactions, whereas Exiguobacterium sp. KBBO11’s five genes were found in 7 pathways/reactions (Fig. 1 and Online Resource 1). Stenotrophomonas maltophilia CBF10-1 also had few genes in KEGG map01220, with only six genes identified in 7 pathways/reactions.

Fig. 1
figure 1

Abridged overview of aromatic degradation pathways (KEGG map01220). This is an abridged version of KEGG map01220 showing only the pathways/reactions present in this dataset. Dots at each arrow indicate the presence of genes for that particular isolate for that reaction which mediate the conversion of the upstream compound to the downstream product or vice versa in the case of reactions that can go either forward or backward. Each colored dot corresponds to a specific isolate

Overall, these isolates putatively lack the ability to degrade polycyclic aromatic compounds (e.g., naphthalene) or dioxygenase and dehydrogenase reactions capable of degrading compounds such as benzene, styrene, toluene, phthalate, and chloro- or fluorobenzene (Fig. 1). However, the benzoate degradation pathway was present at varying levels across the isolates, from complete in P. putida CBF10-2 to nearly absent in Exiguobacterium sp. KKBO11 and S. maltophilia CBF10-1. Pseudomonas putida CBF10-2 and P. stutzeri ODKF13 putatively were able to degrade 3/4-fluorobenzoate to 4-fluorocatechol. Pseudomonas putida CBF10-2 was the only isolate likely capable of degrading 4-methyl benzyl alcohol (a degradation product of p-xylene) to 2-hydroxy-cis-hex-2,4-dienoate as well as a nearly complete 4-hydroxyphenaylacetate catabolism pathway.

Initially, KEGG reconstruction of map01220 indicated A. xylosoxidans ADAF13 as the only isolate capable of degrading nitrobenzene, though further analysis revealed this to be incorrect. For the degradation of nitrobenzene to catechol to occur, four genes are required, naphthalene 1,2-dioxygenase subunit alpha (nbzAc), naphthalene 1,2-dioxygenase subunit beta (nbzAd), a naphthalene 1,2-dioxygenase ferredoxin reductase component (nbzAa), and a naphthalene 1,2-dioxygenase ferredoxin component (nbzAb) (Lessner et al., 2002). We found that only the nbzAb gene was present in A. xylosoxidans ADAF13. The genes upstream and downstream of nbzAb were unrelated to the nbz operon, suggesting that A. xylosoxidans ADAF13 cannot convert nitrobenzene to catechol.

Analysis of the genes surrounding A. xylosoxidans ADAF13’s nbzAb gene (fig|222.333.peg.2710) suggested that this gene was part of an oxygenolytic ortho-dehalogenation operon (ohbRABD). The two genes upstream of the nbzAb gene (fig|222.333.peg.2710) were annotated by KEGG as a tRNA threonylcarbamoyladenosine dehydratase (fig|222.333.peg.2709) and not classified (fig|222.333.peg.2708). The three genes downstream of nbzAb were annotated by KEGG as a salicylate 5-hydroxylase small subunit (nagH) gene, a salicylate 5-hydroxylase large subunit (nagG) gene, and a LysR family transcriptional regulator, mexEF-oprN operon transcriptional activator (mexT) gene (fig|222.333.peg.2711, fig|222.333.peg.2712, fig|222.333.peg.2713, respectively). These four genes likely constitute an ohb operon, with ohbR being the regulatory gene (fig|222.333.peg.2713), ohbA encoding the small beta-ISP (terminal oxidoreductase) subunit ((fig|222.333.peg.2712), ohbB encoding the large alpha-ISP subunit (fig|222.333.peg.2711), and ohbD encoding a ferredoxin gene that enhances the activity of OhbAB (fig|222.333.peg.2710) (Tsoi et al., 1999). The ohbC gene in other ohb operons is an overlapping gene with ohbB and encodes an ATP-binding cassette (ABC) transporter family protein of unknown function (Tsoi et al., 1999); however, we did not find an ohbC gene in this operon. The putative ohbRABD operon was found on an ~ 480 kb contig (fig|222.333.con.0054), which did not appear to be derived from a plasmid suggesting this operon was localized on the chromosome. Neither the ohbRABD operon nor its constituent genes were found in the other isolates investigated in this work.

To better understand the relatedness of A. xylosoxidans ADAF13’s OhbA and OhbB sequences to similar proteins in different bacterial species, including plasmid-borne OhbA and OhbB from a chlorobenzoate degrading A. xylosoxidans isolate (A8) (Jencova et al., 2004), we performed phylogenetic (Fig. 2) and BLASTp analyses (Johnson et al., 2008). Phylogenetic analysis revealed that ADAF13’s OhbA and OhbB were more closely related to the functionally validated OhbA (Fig. 2A) and OhbB (Fig. 2B) from P. aeruginosa JB2 (Hickey & Sabat, 2001) and Ralstonia sp. U2 (Fuenmayor et al., 1998) to the exclusion of the plasmid-borne OhbA and OhbB reported from the A. xylosoxidans A8 isolate. BLASTp analysis of ADAF13’s OhbA and OhbB showed that these genes were found in other Achromobacter species. OhbA had strong BLASTp hits in other A. xylosoxidans isolates, Achromobacter mucicolens, and Achromobacter spanius (query coverage = 100%, percent sequence identity ≥ 98%, e-value ≤ 1e-111). OhbB had BLASTp hits in other A. xylosoxidans isolates and in A. mucicolens, Achromobacter animicus, Achromobacter insolitus, and Achromobacter aresnitoxydans (query coverage = 100%, percent sequence identity ≥ 98%, e-value = 0.0). BLASTp analysis of OhbA and OhbB from the A8 isolate did not indicate any other Achromobacter spp. contained these plasmid-borne proteins. The best BLASTp hits for A8’s OhbA was a sequence from Burkholderiales and P. aeruginosa and for OhbB Pseudomonadota. These data suggest that plasmid-borne ohbAB was likely a rare event for Achromobacter spp. Collectively, our data suggested that A. xylosoxidans ADAF13 cannot convert nitrotoluene to catechol, but it can mediate the dehalogenation of halobenzoates.

Fig. 2
figure 2

Phylogenetic analysis of A. xylosoxidans ADAF13 OhbA and OhbB. Maximum likelihood trees were inferred from amino acid alignments of proteins related to OhbA (A) and OhbB (B). The tip labels indicating this study’s A. xylosoxidans ADAF13 sequences are bolded. The numbers above the branches indicate the percent branch support of 1000 ultra-fast bootstrap replicates. The trees are midpoint rooted. Scale bars indicate the number of substitutions per site

Further analysis of the KEGG annotations revealed genes in 20 different xenobiotic degradation pathways between the isolates. The number of genes classified into the 20 different pathways was enumerated in Online Resource 1. Achromobacter xylosoxidans ADAF13 contained the most pathways with gene(s) in all 20 identified pathways, while Exiguobacterium sp. KBBO11 had the least with genes in only 11 pathways. The number of genes in each pathway within each isolate ranged widely from zero to 62. The benzoate degradation pathway (KEGG ko00362) had the greatest number of genes classified across all isolates. Pseudomonas putida CBF10-2 had the highest number at 65 genes in the benzoate degradation pathway, followed closely by A. xylosoxidans ADAF13 (59 genes). Stenotrophomonas maltophilia CBF10-1 had the least number of genes in this pathway, with 12 genes. The benzoate degradation was particularly interesting to us since it is a pathway associated with the beta-ketoadipate pathway (Harwood & Parales, 1996).

Several microbial degradation pathways for industrially produced aromatic compounds (e.g., naphthalene, aniline, toluene, p-cresol, and benzene) eventually funnel through either catechol or protocatechuate as degradation intermediates (Cao et al., 2009; Phale et al., 2020; Pimviriyakul et al., 2020). Therefore, the catechol and protocatechuate pathways are essential for the mineralization of aromatic compounds allowing for the complete degradation and use as a carbon source by the microbe. All seven isolates being investigated had at least one gene in the benzoate degradation pathway (KEGG ko00362) (Fig. 3), which catabolizes aromatic compounds for TCA-cycle intermediates. However, the data suggested that only P. putida CBF10-2 was capable of catabolism from benzoate or protocatechuate to 3-oxoadipate. Achromobacter xylosoxidans ADAF13 contained catABC, pcaDIJ, and fadA genes whose proteins should be able to catabolize catechol to succinyl-CoA and acetyl-CoA via the ortho-cleavage pathway. However, it was missing the 3-oxoadipyl-CoA thiolase gene, pcaF. Exiguobacterium sp. KKBO11 had only catE, praC, and pcaC, suggesting it could cleave catechol via meta-cleavage but not process the 2-hydroxymuconate semialdehyde product further in this pathway. Ochrobactrum anthropi FRAF13 encoded pcaCDGH, fadA, and catE, a nearly complete protocatechuate catabolism pathway to succinyl- and acetyl-CoA products. However, KEGG did not annotate a 3-carboxy-cis, cis-muconate cycloisomerase, PcaB, which converts β-carboxymuconate to γ-carboxymuconolactone. BLASTp analysis of FRAF13’s predicted proteins using an O. anthropi PcaB (AIK42087.1) as the query identified fig|529.295.peg.3137 as a PcaB (query coverage = 100%, percent sequence identity = 93.5%, e-value = 0.0). This protein was not annotated by KEGG but was classified as a 3-carboxy-cis, cis-muconate cycloisomerase by BV-BRC’s RASTtk annotation. Pseudomonas putida CBF10-2 had the most extensive benzoate degradation gene repertoire of all the isolates. CBF10-2 had a complete set of benABCD genes to convert benzoate to catechol and a complete set of ortho-cleavage of catechol genes (catABC, pcaDIJF, and fadA). Moreover, CBF10-2 also had pcaBCGH genes to allow for protocatechuate catabolism and a nearly complete meta-cleavage of catechol gene complement (dmpCH, praC, mhpDEF). Pseudomonas putida CBF10-2 was missing the catechol 2,3-dioxygenase gene (dmpB or catE). Neither KEGG nor RASTtk annotated a catechol 2,3-dioxygenase gene. BLASTp analysis of CBF10-2’s predicted proteins using a Pseudomonas spp. Catechol 2,3-dioxygenase protein sequence (WP_011475388.1) did not return any hits. Pseudomonas stutzeri ODKF13 had similar capabilities as P. putida CBF10-2 except for the partial meta-cleavage pathway and was missing PcaI and PcaJ. ODKF13 did not contain any meta-cleavage pathway genes. Given the nearly complete benzoate catabolism pathway, we thought it was unlikely that P. stutzeri ODKF13 was missing PcaI and PcaJ. Therefore, we performed a BLASTp analysis of ODKF13 predicted proteins against P. stutzeri PcaI (VEI36278.1) and PcaJ (AEJ04444.1). BLASTp analysis yielded significant hits with fig|316.732.peg.3339 (query coverage 100%, percent sequence identity = 93.68%, e-value = 0.0) for PcaI and fig|316.732.peg.3340 (query coverage 100%, percent sequence identity = 100%, e-value = 0.0) for PcaJ. KEGG annotated these proteins as K01039 (glutaconate CoA-transferase, subunit A) and K01040 (glutaconate CoA-transferase, subunit B), respectively; however, RASTtk annotated both proteins as 3-oxoadipate CoA-transferase subunits A and B. Rhizobium radiobacter GHKF11 and O. anthropi FRAF13 were similarly equipped containing a near-complete protocatechuate catabolism pathway except for a PcaB in GHKF11. Using BLASTp, we used an Agrobacterium tumefaciens (synonymous with Rhizobium radiobacter) (Flores-Félix et al., 2020; Young et al., 2001) PcaB sequence (WP_262526641) as a query against the predicted proteins of GHKF11, which yielded a significant hit (fig|379.725.peg.2964, query coverage 99%, percent sequence identity = 78.29%, e-value = 0.0). Again, we found that KEGG failed to annotate PcaB, but RASTtk identified fig|379.725.peg.2964 as a 3-carboxy-cis, cis-muconate cycloisomerase. Lastly, Stenotrophomonas maltophilia CBF10-1 had only three genes in this pathway (praC, pcaC, and fadA), and none confer any ring cleavage capabilities. Overall, catechol cleavage was likely possible in all isolates except S. maltophilia CBF10-1 with protocatechuate cleavage in O. anthropi FRAF13, P. putida CBF10-2, P. stutzeri ODKF13, and R. radiobacter GHKF11. Further catabolism of cleavage products was possible in all cases, with the exception of Exiguobacterium sp. KKBO11. The fact that the isolates investigated (excluding Exiguobacterium sp. KKBO11 and S. maltophilia CBF10-1) can likely degrade catechol and protocatechuate could be exploited for genome engineering of these microbes to allow for complete degradation of industrially significant aromatic compounds.

Fig. 3
figure 3

Abridged benzoate degradation pathway (KEGG map00362). This figure is an abridged KEGG map00362 that focuses on benzoate degradation via ortho- and meta-cleavage of catechol and protocatechuate degradation. Dots present for each enzyme indicate the presence of that enzyme for a particular isolate. Each colored dot corresponds to a specific isolate. Dotted arrows indicate there are other intermediate steps not listed. KEGG compound numbers are indicated for each chemical structure

Given the use of halogenated aromatic compounds in industrial and agricultural applications (Jeschke, 2017, 2022), their toxicity in humans (Heid et al., 2001; Vickers et al., 1985), and their longevity in the environment (Salkinoja-Salonen et al., 1995), we investigated the degradation of these compounds further (Fig. 4). Proteins within the benzoate degradation pathway also confer the ability to catabolize halobenzoates (Haddad et al., 2001; Neidle et al., 1991). With benABCD, catAB, and carboxymethylenebutenolidase genes, P. putida CBF10-2 and P. stutzeri ODKF13 likely could catabolize 3- and 4-fluorobenzoate compounds to 2-malelyacetate and hydrofluoric acid (Fig. 4A). Achromobacter xylosoxidans ADAF13 did not possess BenABCD but did have the OhbAB proteins, which could act on 2-fluorobenzoate after which the CatAB and carboxymethylenebutenolidase proteins could further catabolize the compound to 2-malelyacetate and hydrofluoric acid. None of these isolates had a maleylacetate reductase protein that would convert 2-malelyacetate to 3-oxoadipate for further catabolism in the benzoate degradation pathway. Furthermore, the CatAB proteins working with a carboxymethylenebutenolidase in A. xylosoxidans ADAF13, P. putida CBF10-2, and P. stutzeri ODKF13 likely confer the ability to catabolize 4-chlorocatechol to cis-acetylacrylate (Fig. 4B). Halogenated aromatics are long-lived in the environment, toxic, and pose a threat to contaminating soil and water sources (Goswami et al., 2022; Heid et al., 2001; Vickers et al., 1985; Xie et al., 2021; Zhang et al., 2017); therefore, the ability of these three isolates, A. xylosoxidans ADAF13, P. putida CBF10-2, and P. stutzeri ODKF13, to degrade halogenated aromatics should be further investigated.

Fig. 4
figure 4

Halobenzoate degradation pathways. The fluorobenzoate pathways (A) are adapted from KEGG map00364. The 4-chlorocatechol degradation pathway (B) is adapted from KEGG map00361. Dots present for each enzyme indicate the presence of that enzyme for a particular isolate. Each colored dot corresponds to a specific isolate. Dotted arrows indicate there are other intermediate steps not listed. KEGG compound numbers are indicated for each chemical structure

Lastly, we expanded our search for aromatic compound degradation ability by investigating and assessing the putative ability of these isolates to catabolize gentisate, homogentisate, and 3,4-dihydroxyphenylacetate (homoprotocatechuate), pathways of which are found in tyrosine metabolism (KEGG map00350). Counting the genes in the sub-pathways of KEGG map00350 (only genes involved in the ring cleavage and catabolism of gentisate, homogentisate, and homoprotocatechuate) indicated that, on average, each isolate had approximately nine genes annotated by KEGG (Online Resource 2). Achromobacter xylosoxidans ADAF13 had the most genes at 20 with Exiguobacterium sp. KBBO11 with the least two genes (Fig. 5). Gentisate is an intermediate degradation product of industrial aromatic compounds such as naphthalene and the pesticide carbaryl (Mohapatra & Phale, 2021; Swetha & Phale, 2005; Zhu et al., 2019). Achromobacter xylosoxidans ADAF13 was the only isolate with a gentisate 1,2-dioxygenase, which mediates ring cleavage, and had two different copies of this gene (fig|222.333.peg.2644 and fig|222.333.peg.4034). BLASTp analysis of the proteins indicated they are quite different in amino acid sequence (query coverage = 95%, e-value 9e-91, percent identity = 41.91%). Another bacterium, Rhizorhabdus dicambivorans Ndbn-20, was recently reported with two different gentisate 1,2-dioxygenases involved in degrading the herbicide dicamba (Li et al., 2020). One of these dioxygenases could cleave a 3-chlorogentisate, while the other could not. Further investigation is warranted into the function of the two gentisate 1,2-dioxygenases in A. xylosoxidans ADAF13 and their role in the potential degradation of halogenated gentisates. Achromobacter xylosoxidans ADAF13 was also the only isolate to have nagLK, whose proteins further catabolize gentisate to pyruvate and fumarate. Homogentisate degradation is involved in the catabolism of phenylalanine, tyrosine, and 3-hydrophenylacetate (Arias-Barrau et al., 2004) but may also be a downstream degradation intermediate of styrene via side-chain oxygenation (Baggi et al., 1983). Styrene is reasonably anticipated to be a human carcinogen and toxin, and bioremediation could be beneficial for environmental cleanup (ATSDR, 2010; Migliore et al., 2006; NTP, 2021). Homogentisate catabolism to fumarate and acetoacetate was likely in A. xylosoxidans ADAF13, P. putida CBF10-2, P. stutzeri ODKF13, and S. maltophilia CBF10-1. Two gene copies of homoprotocatechuate 2,3-dioxygenase were found in P. putida CBF10-2 and were not found in any other isolate. The homoprotocatechuate degradation pathway is also associated with aromatic amino acid catabolism (Cooper & Skinner, 1980; Dı́az et al., 2001; Roper et al., 1993); however, to our knowledge, this pathway does not play a direct role in aromatic or xenobiotic compound degradation. Moreover, P. putida CBF10-2 was the only isolate likely to catabolize homoprotocatechuate to succinate. An expanded search into tyrosine metabolism sub-pathways suggested additional aromatic ring cleavage capabilities in A. xylosoxidans ADAF13, P. putida CBF10-2, P. stutzeri ODKF13, and S. maltophilia CBF10-1.

Fig. 5
figure 5

Gentisate, homogentisate, and homoprotocatechuate degradation pathways. This is adapted from KEGG map00350. The gentisate (A), homogentisate (B), and homoprotocatechuate (C) degradation pathways are shown. Dots present for each enzyme indicate the presence of that enzyme for a particular isolate. Each colored dot corresponds to a specific isolate. Dotted arrows indicate there are other intermediate steps not listed. KEGG compound numbers are indicated for each chemical structure

2.2 Heavy Metal Resistance

We investigated the proteins associated with arsenic, cadmium, chromium, lead, and mercury resistance in our data set. Heavy metal contamination in soil and other environments is common worldwide (Xiang et al., 2021; Xiao et al., 2020; Zhou et al., 2020); therefore, any potential bioremediation strains should either be able to resist these hostile conditions or help remediate these environments. These metals are present as a result of both natural geochemical processes and as human-derived pollutants (Ahmad et al., 2021; Briffa et al., 2020; Tchounwou et al., 2012). Microbes have developed numerous systems to resist the presence of heavy metals found in soil (Gonzalez Henao & Ghneim-Herrera, 2021; Mathivanan et al., 2021; Nies, 2003; Pal et al., 2022). We used a combined approach with the annotations from RASTtk and the BacMet v2.0 Predicted database (Pal et al., 2014). The numbers of metal resistance proteins are graphed in Fig. 6. RASTtk found far more metal resistance proteins (273/295) than were identified using the BacMet v2.0 Predicted database (39/295). Both strategies overlapped on only 17 proteins. Arsenic resistance-related proteins were the most prevalent, with, on average, approximately 15 proteins per isolate. Many of these were ArsR proteins, the regulatory protein for ars operons (Ji & Silver, 1992; Wu & Rosen, 1991). On average, seven arsenic resistance proteins are predicted to be ArsR family transcriptional regulators. These regulators were not always associated with ars operons. The ArsR family of transcriptional regulators are metalloregulatory repressors that repress operons encoding genes that increase intracellular concentrations of heavy metals (e.g., arsenic import proteins) (Wu & Rosen, 1991). ArsR family regulators also control operons related to other heavy metals (e.g., zinc, nickel, cobalt, and cadmium) (Cavet et al., 2002; Yoon et al., 1991). Lead resistance-related proteins were the least prevalent, with only about two lead resistance-related proteins present per isolate, with none of these being found in a pbr operon. We also found the presence of proteins conferring resistance to multiple heavy metals. These proteins included CzcD and FieF/YiiP, which confer resistance to cobalt, zinc, and cadmium (Anton et al., 1999; Wei & Fu, 2005), and proteins annotated as heavy metal translocating ATPases likely conferring lead, cadmium, zinc, mercury, and copper resistance.

Fig. 6
figure 6

Heavy metal resistance protein counts. Heavy metal resistance proteins were enumerated for each isolate. These proteins were plotted as histograms. The proteins classified into each resistance type are found in the Zenodo repository in a Microsoft Excel workbook “Heavy_Metal_Resistance_Proteins”

We focused on better characterizing the arsenic resistance genes given that arsenic is a non-essential metalloid, nearly ubiquitous in soil (Masuda, 2018; Reimann et al., 2009), toxic to humans (Golub et al., 1998; Islam et al., 2015), and can contaminate the food chain through absorption into agricultural products (Ciminelli et al., 2017; Cubadda et al., 2010; Huq et al., 2006; Santra et al., 2013). Two different arsenic resistance operons were identified, the ars and pst operons. The ars operon is the most well-defined arsenic resistance system in bacteria (Ben Fekih et al., 2018). Each isolate had a single ars operon (Fig. 7), and the operons varied in complexity. Exiguobacterium sp. KKBO11 had the simplest ars operon with two genes, arsR coding for the transcriptional regulator and arsB coding for an arsenite (As[III]) permease (Ji & Silver, 1992; San Francisco et al., 1989; Wu & Rosen, 1991). Pseudomonas stutzeri ODKF13 had the most sophisticated ars operon. This operon consisted of arsRHJ with an arsJ-associated gapdh gene whose proteins collectively confer resistance to arsenate (As[V]) as well as organoarsenicals (Chen et al., 2015, 2016). The ODKF13 ars operon was also interrupted with two non-ars genes, a predicted tyrosine phosphatase, and a small multidrug resistance family-3 protein. Interestingly, upstream of the arsR regulator in this operon were an arsenite permease (acr3)(Sato & Kobayashi, 1998) and a putative monooxygenase with unknown function to arsenic resistance (arsO) (Wang et al., 2006). The pst operon is a common phosphate transport system in bacteria, which transports arsenate into the cell for detoxification and export through other mechanisms (e.g., Ars proteins) as arsenite (Rosenberg et al., 1977). All isolates had this phosphate transport system in a single pstSCAB operon; however, Pseudomonas putida CBF10-2 had 2 of these operons. Taken together, our data suggested that these isolates have diverse mechanisms for resisting arsenic in its different oxidation states (arsenite and arsenate) and, in the case of P. stutzeri ODKF13, potentially even organic arsenicals.

Fig. 7
figure 7

Arsenic resistance operons. Arsenic operons were drawn for each isolate. Each type of arsenic resistance gene is a similarly colored arrow between isolates. Genes without indication of conferring arsenic resistance are gray arrows with the gene annotation indicated. Double slashes indicate breaks in DNA between one group of genes and another

Significantly, Ars proteins do not degrade or remove arsenic in any of its forms from the soil (Yan et al., 2019). However, arsenic-resistant bacteria can either be engineered to sequester arsenic for removal from the environment through phytochelatins or metallothioneins that bind arsenate or arsenic (Li et al., 2015; Ma et al., 2011; Ruiz et al., 2011; Sauge-Merle et al., 2003). Heavy metal-resistant bacteria can also be used in phytoextraction (absorption of arsenic into plants for later removal), where these bacteria can make the phytoextraction process more efficient (Lampis et al., 2015; Mesa et al., 2017). Given the content of their ars operons and overall composition of meta resistance-related proteins, A. xylosoxidans ADAF13, P. stutzeri ODKF13, and R. radiobacter GHKF11 are likely good candidates to use in heavy metal-contaminated environments.

3 Limitations and Conclusions

This study investigated the genomes of Achromobacter xylosoxidans ADAF13, Exiguobacterium sp. KKBO11, Ochrobactrum anthropi FRAF13, Pseudomonas putida CBF10-2, Pseudomonas stutzeri ODKF13, Rhizobium radiobacter GHKF11, and Stenotrophomonas maltophilia CBF10-1 for the presence of genes and pathways that would indicate the isolates’ potential as bioremediation tools. These isolates were previously shown to have low-level organophosphate degradation capabilities against either paraoxon, ethyl paraoxon, methyl parathion, or chlorpyrifos (Islam & Iyer, 2021; Iyer & Iken, 2013; Iyer et al., 2016, 2018). This study took a broad comparative genomics approach to investigate the aromatic and xenobiotic compound degradation pathways while assessing potential resistance to certain heavy metals.

This analysis was exclusively based on predictions from comparative genomics analysis and was not without limitations. The genomes analyzed were sequenced with short-read technology only and thus are not entirely assembled or “closed.” This may have a bearing on accurate gene counts in instances of tandem repeats and in determining whether genes are chromosomal or plasmid-borne accurately. Future work using long-read platforms such as Oxford Nanopore Technologies or Pacific Biosciences can correct this. Functional annotation based on prediction software is also inherently limited based on the underlying data used to assign the annotations (Lobb et al., 2020; Salzberg, 2019). Non-model bacteria are at a disadvantage with these approaches since genes that are either specific to particular genera or significantly diverged in sequence may require manual annotation (Lobb et al., 2020). Indeed, KEGG and BV-BRC functional annotation of genes in these genomes only accounted for approximately 60% of genes in each genome. Future work should empirically investigate these isolates’ degradation and resistance capacities to help validate the findings from this work. Those investigations will offer exciting opportunities to discover new genes and pathways relevant to bioremediation.

Collectively, these isolates contained a variety of pathways for the degradation of aromatic and xenobiotic compounds as well as genes that likely confer resistance to a number of heavy metals. Based on the number and variety of intact pathways in this analysis, A. xylosoxidans ADAF13, P. putida CBF10-2, and P. stutzeri ODKF13 are the best candidates for further development through genome engineering or use in consortia for bioremediation purposes.

4 Methods

4.1 Genomes

The genomes used for analysis in this study were downloaded as FASTA files from GenBank: A. xylosoxidans ADAF13 LSMI00000000, Exiguobacterium sp. KKBO11 LUCU00000000, O. anthropi FRAF13 LSVB00000000, P. putida CBF10-2 LUCV00000000, P. stutzeri ODKF13 LSVE00000000, R. radiobacter GHKF11 LVFG00000000, and S. maltophilia CBF10-1 LTAC00000000.

4.2 Sequence Analysis

4.2.1 Genome Annotation

Genome contigs were annotated using the RASTtk tool as a part of BV-BRC’s “Annotation” service (Brettin et al., 2015; Olson et al., 2023). The “Bacteria” annotation recipe and the appropriate taxonomic information for each respective genome were selected. The annotation outputs for each genome can be found in the Zenodo repository (see data availability for the DOI) in a folder for each genome (e.g., A.xylosoxidans_ADAF13_RASTtk_Annotation).

4.2.2 KEGG Annotation and Pathway Reconstruction

The predicted protein sequences for each genome were annotated using KEGG’s KofamKOALA tool (ver. 2023–01-01, release 105.0) (Aramaki et al., 2020). The KEGG mapper input file generated for each genome was used with the KEGG Mapper Reconstruct tool (Kanehisa & Sato, 2020). The KEGG mapper input file was also used to enumerate the amount of KEGG genes across different pathways via Microsoft Excel based on the occurrence of specific KEGG orthology (ko) identifiers using the COUNTIF function. KEGG mapper input text files for each genome and the Microsoft Excel workbook with KEGG gene counts can be found in the Zenodo repository in the folders “KEGG_Annotation” and “KEGG_Gene_Counts,” respectively. The pathway figures were generated using Inkscape (Inkscape, 2022).

4.2.3 OhbAB Phylogenetic Analysis

Phylogenetic analysis of OhbAB proteins in A. xylosoxidans ADAF13 was carried out using a subset of protein sequences used by Tsoi et al. (Tsoi et al., 1999) and the OhbAB proteins found in the A. xylosoxidans A8 isolate (Jencova et al., 2004). The amino acid sequences of the selected proteins were aligned using MAFFT (v7.490) (Katoh & Standley, 2013) using the –auto setting for the OhbA and OhbB alignments. Maximum likelihood phylogenetic trees were inferred using IQ-TREE2 using the -B 1000, -m MFP, and -T AUTO options (Hoang et al., 2018; Kalyaanamoorthy et al., 2017; Minh et al., 2020). The tree files were visualized using iTOL (v6.7.1) (Letunic & Bork, 2021), and the trees were annotated in Inkscape. The amino acid alignments and newick tree files can be found in the Zenodo repository in the “OhbAB_phylogenies” folder.

4.2.4 Heavy Metal Resistance Protein Analysis

Heavy metal resistance protein analysis was conducted using a combination of the BacMet Predicted database (v2.0, March112018) (Pal et al., 2014) and RASTtk annotations. The BacMet Predicted database was downloaded as a FASTA file of amino acid sequences and clustered using CD-HIT (v4.8.1) (Fu et al., 2012) for sequences that were 100% identical (cd-hit -c 1 -i bacmet2_predicted_database.fasta -o bacmet_predicted_clustered_c1.fasta). The bacmet2_predicted_clustered_c1.fasta file was used to generate a protein database for BLAST using BLAST v2.12.0 + (Camacho et al., 2009) (makeblastdb -dbtype prot -title bacmet2_predicted_c1 -in bacmet2_predicted_clustered_c1.fasta). The predicted proteins for each genome were concatenated into a single FASTA file (total_protein.fasta) and used as the BLASTp query against the bacmet2_predicted_c1 database (blastp -query total_protein.fasta -db bacmet2_predicted_clustered_c1.fasta -out total_protein_bacmet_predicted_c1.tsv -outfmt 6 -num_threads 4). We filtered the resulting BLASTp hit file for those hits with a sequence identity hit ≥ 85%, and in the case of identical hits for the same query sequence, we selected the hit with the highest bit score. This filtering was done using the following awk script: awk -F '\t' '{if ($3 >  = 85 && (!($1 in max) || $3 > max[$1])) {row[$1] = $0; max[$1] = $3} else if ($3 >  = 85 && $1 in row && $3 =  = max[$1]) {row[$1] = row[$1]"\n"$0}} END {for (i in row) {print row[i]}}' total_protein_bacmet_predicted_c1.tsv > filt_85_total_protein_bacmet_predicted_c1.tsv.

Only BLASTp hits in filt_85_total_protein_bacmet_predicted.tsv with an e-value less than 1e-4 were reported. The files for each step of the BacMet2 BLASTp analysis can be found in the Zenodo repository under the folder “BacMet2_Analysis.”

The RASTtk annotations were also analyzed to identify heavy metal resistance proteins. The GFF file for each genome generated by RASTtk was used to find proteins associated with metal resistance. The GFF files were searched for specific terms for each metal/metalloid using Microsoft Excel. For arsenic resistance proteins, the terms “arsenic,” “arsenate,” “arsenite,” “as,” “ars,” and “pst” were used. For cadmium resistance proteins, “cadmium,” “cad,” “cz,” and “cd” were used. For chromium resistance proteins, the terms “chromium,” “chromate,” “cr,” and “chr” were used. For copper resistance proteins, the terms “copper” and “cu” were used. For lead resistance proteins, the terms “lead,” “pb,” and “pbr” were used. For mercury resistance proteins, the terms “mercury,” “hg,” and “mer” were used. For zinc resistance proteins, the terms “zinc” and “zn” were used. Spurious hits that were not related to resistance proteins were removed. Proteins related to heavy metal transport/translocation and annotated with a given protein name found in the BacMet2 Predicted database were retained. The list of these proteins can be found in the Zenodo repository in a Microsoft Excel workbook “Heavy_Metal_Resistance_Proteins.” These proteins were counted for each genome and plotted as histograms using Microsoft Excel. The arsenic resistance operon figure was generated using Microsoft PowerPoint.