Introduction

Heavy metals present a dual challenge to bacteria, as many metals are biologically necessary at low levels but toxic at higher concentrations (Silver 1998; Nies 2004a). While heavy metals play vital roles as enzymatic cofactors and in electron transport, high concentrations of heavy metals can also disrupt nucleic acids, phospholipid membranes, and enzymes (Halliwell and Gutteridge 1985; Stadtmann 1993; Hoshino et al. 1999; Stohs and Bagchi 1995). Bacteria have adapted various strategies for coordinating intracellular heavy-metal homeostasis, one of the most prominent being mediated by two-component regulatory systems (TCRS) (Bruins et al. 2000). TCRS are the predominant signal transduction mechanism connecting environmental stimuli to cellular responses in bacteria (Silver and Phung 1996). These systems are comprised of a histidine kinase sensor and a cognate response regulator. Histidine kinase sensors are generally located in the cell membrane and are autophosphorylated in the presence of an environmental trigger (West and Stock 2001). The sensor kinase subsequently phosphorylates a corresponding response regulator, resulting in differential gene expression involved in many critical aspects of bacterial physiology, including antibiotic resistance, osmolarity homeostasis, virulence, and heavy-metal tolerance (Barkay and Olson 1986; Parkinson and Kofoid 1992).

Anthropogenic inputs of heavy metals into the environment have become widespread and, in many cases, are highly toxic. Major inputs from mining (da Pelo et al. 2009), smelting (Pip 1991), chemical waste (Barkay and Olson 1986) and the increased use of agricultural bacteriocides (Cooksey 1994) may promote the adaptive evolution of heavy-metal resistance genes among bacterial communities. For example, the sequestration or efflux of copper from bacterial cells is encoded by the cop operon in various bacterial taxa, which is regulated by a TCRS (Brown et al. 1994; Cha and Cooksey 1991). This TCRS regulates several genes including copA and copB, whose protein products are involved in the compartmentalization or efflux of copper from the cell (Lee et al. 1994). Similarly, bacterial tolerance to cadmium, zinc, and cobalt is also mediated by a TCRS that modulates the efflux of these metals across the bacterial cell membrane (Nies et al. 1989). In this system, the proteins CzcA, CzcB, and CzcC comprise an efflux pump, with CzcA and CzcB acting in concert as a translocase to move these metals through the cytoplasmic membrane, with CzcC contributing to full tolerance to cadmium, zinc, and cobalt (Diels et al. 1995; Nies 2003).

The common presence of heavy-metal resistance genes on plasmids suggests that these genes may be spread within bacterial communities by horizontal gene transfer (Ochman et al. 2000). This process could potentially enhance the microbial community’s ability to adjust to shifting environmental conditions by the relatively rapid spread of genes involved in a variety of bacterial adaptations, including virulence (Ochman and Moran 2001), heavy-metal tolerance (Nies and Brown 1998), organochlorine metabolism (van der Meer 2006), and antibiotic resistance (de la Cruz and Davies 2000). Furthermore, in situ lateral transfer of genes involved in ATPase-dependent heavy-metal resistance, such as mercury resistance, has been previously illustrated (Osborn et al. 1997). While experimental studies have demonstrated the potential for in situ horizontal transfer of heavy-metal tolerance genes (Coombs and Barkay 2004), the extent to which lateral gene transfer has modulated the evolutionary history of heavy-metal resistance genes is relatively unknown.

Our general hypothesis is that, given the frequent association of heavy-metal resistance genes with bacterial plasmids, these genes will show multiple instances of horizontal transfer over their evolutionary history, as suggested by their documented sequence and functional similarity to other TCRS (Nies et al. 1995). Furthermore, we predict that common mechanisms associated with the signal transduction of TCRS would be retained through functional constraints associated with the three-dimensional (3D) structure of regulatory proteins.

Although previous studies have compared functional motifs among heavy-metal tolerance proteins based on sequence similarity (e.g., Fong et al. 1995; Thilákaraj et al. 2007), these comparisons do not necessarily provide robust predictors of either evolutionary relatedness or protein function (Bouzat et al. 2000; Eisen and Wu 2002). This study employs phylogenetic analyses of published protein sequences to determine potential orthologous/paralogous relationships among heavy-metal tolerance proteins, as well as to examine the potential role of horizontal gene transfer in their evolutionary history. Phylogenetic reconstructions incorporate evolutionary models to provide more reliable estimates of evolutionary history than those inferred from simple sequence similarity (Eisen 1998). Furthermore, phylogenetic analyses have proved useful in generating predictions about function, which can then be complemented with comparative analyses of proteins’ 3D structures.

In this study, phylogenetic analyses of TCRS proteins and their corresponding efflux proteins revealed patterns of orthology/paralogy among copper (CopA, CopB) and cadmium, zinc, and cobalt (CzcA and CzcB) efflux proteins, as well as among their regulatory proteins CopR, CzcR, CopS, and CzcS. Phylogenetic inferences revealed distinct topologies dependent upon the metal substrate and the regulatory or structural role of the protein. We report several instances of horizontal gene transfer in the spread of heavy-metal tolerance genes, including multiple horizontal gene transfer events between prokaryotes and eukaryotes. In addition, comparative analyses of the 3D structure of regulatory proteins confirmed a common evolutionary origin followed by functional divergence of genes encoding these TCRS sequences and other regulatory sequences associated with regulation of antibiotic resistance (Li et al. 2010) and virulence (Parish et al. 2003).

Materials and Methods

Database Searches of Heavy-Metal Protein Sequences

Analyses reported here were based upon protein sequences that had been submitted to the National Center for Biotechnology Information (NCBI) GenBank database (Altschul et al. 1990) as of March, 2012. Protein sequences associated with heavy-metal resistance to copper, cadmium, zinc, and cobalt were identified through BLASTx queries of cop and czc genes, as well as cutA2 and scsB, performed using a BLOSUM62 substitution matrix and default parameters. Both efflux and regulatory query sequences included nucleotide sequences from Pseudomonas syringae (copA, GI:151188, 1830 bp; copB, GI:151189, 987 bp; copR, GI:151181, 674 bp; copS, GI:151182, 1464 bp), Ralstonia metallidurans (czcA, GI:1731918, 3192 bp; czcB, GI:1731917, 1563 bp; czcR, GI:1403127, 678 bp; czcS, GI:1403128, 1428 bp), E. coli (cycZ, GI:948649, 1460 bp), and S. typhimurium (scsB, GI:2327004, 1884 bp), as respective metal tolerance genes from these species have been extensively characterized and, in all instances, confirmed through functional studies. Only complete non-redundant protein sequences exhibiting empirically established roles in heavy-metal tolerance were downloaded (in FASTA format) for subsequent phylogenetic analysis, whereas partial protein sequences or sequences with no corroborated function were disregarded. Non-redundant functionally uncharacterized proteins were also selected from published, fully sequenced genomes based upon low expectation values and the detection of conserved functional motifs. Specifically, selected CopA and CopB sequences exhibited conservation of the copper-binding motif DHXXMXXM (Cha and Cooksey 1991), whereas CzcA sequences contained the key motif DFGXXXDGAXXXVEN (Nies 2003). All response regulators and sensor kinases included for analyses demonstrated conservation of amino acid residues critical to phosphorelay (Koretke et al. 2000; Buckler et al. 2002).

Disulfide isomerases, encoded by cycZ and scsB homologs, with a CXXC metal-binding motif were also included in our analyses (Gupta et al. 1997; Page et al. 1997). The inclusion of cycZ and scsB homologs, including cutA2 of Escherichia coli, in these analyses was warranted, as they provide tolerance to copper, zinc, nickel, cobalt, and cadmium in E. coli and Salmonella typhimurium (Gupta et al. 1997), respectively. Copper homeostasis in E. coli seems to involve coordination between a plasmid-borne cop homolog and the cut operon (Brown et al. 1994). The moderate sequence similarity and potential coordination of CycZ/ScsB proteins with Cop homologs suggest common evolutionary origins for these protein families.

Phylogenetic Analyses

Structural and regulatory sequences were separately aligned using CLUSTAL-X, v. 1.8 (Thompson et al. 1997) multiple sequence alignment program with default parameters. Each multiple sequence alignment was visually inspected for potential sequence overhangs and local misalignments due to the presence of sequence gaps. Curated alignments were subsequently imported into MrBayes, v. 3.0 (Ronquist and Huelsenbeck 2003) or MEGA, v. 5 (Tamura et al. 2011) for implementation of Bayesian and maximum likelihood algorithms for phylogenetic inferences, respectively. Phylogenetic reconstructions of heavy-metal efflux proteins were rooted using CycZ and ScsB homologs, including cutA2 from E. coli, based on the fundamental role of these proteins in both anaerobic respiration as well as heavy-metal tolerance (Gupta et al. 1997; Page et al. 1997). Potential instances of horizontal gene transfer were ascertained by comparing reconstructed protein trees to published 16S rDNA phylogenies (Doolittle 1999; Eisen 2000). Potential examples of horizontal gene transfer were interpreted only when both maximum likelihood and Bayesian phylogenetic inferences associated with the gene trees demonstrated identical topologies that differed consistently from the topologies predicted by 16S rDNA trees representing species phylogenies.

Gene sequences corresponding to putative horizontal gene transfer events were then evaluated for differential GC content in comparison with the overall host genome. Disparate measures of GC percent are commonly interpreted as a confirmatory measure of horizontal gene transfer (Lawrence and Ochman 1998; Jain et al. 2002). Differential GC content of putative cases of horizontal gene transfer was compared to average differential GC % estimated from 6 to 21 available genomes from their corresponding taxonomic groups. Differential GC content values that fell beyond a standard deviation from the average distribution of values for their corresponding taxonomic group were considered as confirmatory of horizontal transfer events. The nucleotide composition of metal tolerance genes was determined with MEGA v. 5, while the mean GC content of host bacterial genomes was obtained from the GenBank genome databases.

Within the program MrBayes, posterior probabilities were separately established for both structural and regulatory protein phylogenies with a Bayesian Metropolis-coupled Markov Chain Monte Carlo Method (Ronquist and Huelsenbeck 2003). To optimize the evolutionary model utilized within MrBayes, jumping among fixed-rate amino acid models was implemented for 25,000 generations with six chains, and a burn-in of 100 trees for six independent runs to reach convergence. Convergence of the six runs was tested every 1,000 generations and determined when log likelihood values reached a stationary phase (the average standard deviation of split frequencies was <0.1). The program PAUP* v. 4 (Swofford 2002) was employed to visualize phylogenies developed in MrBayes.

To confirm evolutionary relationships inferred from Bayesian analyses, phylogenies were also reconstructed using maximum likelihood with a Jones Thornton Taylor model of amino acid change (Jones et al. 1992; Nei and Kumar 2000). The relative confidence of nodes within maximum likelihood phylogenies were assessed through 200 bootstrap resamplings of the data (Felsenstein 1985). Maximum likelihood trees were generated using MEGA, v. 5 (Tamura et al. 2011).

3D Analyses of Protein Conformation

The availability of crystal structures for several response regulators provided the opportunity to investigate potential links between phylogenetic divergence and the conservation of 3D protein structure. The amino acid sequences of the CzcR protein from R. metallidurans and the CopR protein from P. syringae were separately used as search queries to obtain 3D coordinate sets of related protein structures from the SWISS-MODEL repository (Kopp and Schwede 2004) of annotated protein structure template models. The crystal structures of the top database hits were then used as modeling templates to predict the 3D structures of CopR and CzcR, generated using the SWISS-MODEL First Approach Mode automated server (Guex and Peitsch 1997).

Three-dimensional protein structures, as well as predictions of secondary structure, were then aligned and visualized with the program Deep View Swiss-Pdb Viewer, v. 4.0.1 (Schwede et al. 2003). To improve the structural alignments, a Magic Fit was performed, followed by an iterative Magic Fit. Aligned structures were then submitted to the SWISS-MODEL Project Mode automated server to obtain the optimal structural alignment. To assess the quality of the fit between the superimposed target and template structures, the root means squared (RMS) deviations were calculated between the backbone atoms of each target and template molecule.

Results

Database Searches of Heavy-Metal Protein Sequences

BLAST searches of structural efflux proteins CopA, CopB, CzcA, and CzB resulted in the identification of 174 complete sequences either functionally characterized or from fully sequenced genome projects (Supplementary Materials). All bacterial heavy-metal tolerance protein sequences collected from the GenBank database corresponded to the taxonomic classes of alpha-, beta-, delta-, and gamma-proteobacteria. These sequences included 39 protein sequences for CopA and CopB, and 48 sequences for the CzcA and CzcB proteins. CopA sequences exhibited higher amounts of similarity than corresponding CopB sequences, as determined by reported expectation values and percent of amino acid identities. Each potential CopA homolog, including the ferroxidase McoA, had expectation values equal to or less than e−135 and protein sequence similarities of at least 46 % over 638 amino acid residues of the CopA query sequence from P. syringae. In contrast, CopB homologs had expectation values as great as e−54 and amino acid sequence similarity scores of as low as 40 % through a coverage area of 301 amino acid residues of the CopB sequence from P. syringae.

In a similar fashion, CzcA sequences demonstrated a higher degree of similarity than corresponding CzcB sequences. All CzcA sequences had expectation values of 0 and amino acid sequence identities of at least 48 % over a coverage area of 1,065 amino acid residues of the CzcA protein from R. metallidurans. Yet, CzcB sequences had expectation values as great as e−35 and amino acid sequence similarities of as low as 32 % over 213 amino acid residues of the CzcB from R. metallidurans.

BLAST searches of response regulators and histidine kinase sensors retrieved 133 complete protein sequences (Supplementary Materials). Each potential response regulator homolog had expectation values of at least 4e−62 and protein similarity scores of not <47 % over 221 amino acid residues of the CzcR sequence from R. metallidurans. Histidine kinase sensor homologs had expectation values as great as e−54 and amino acid sequence similarities of as low as 40 % over 306 residues of the CzcS protein from R. metallidurans. Due to the high sequence similarity among heavy metal response regulators, as well as among histidine kinase sensors, their classification as either CopR or CzcR and CopS or CzcS could not be accurately determined. Several taxa with potential Cop or Czc efflux homologs had no corresponding TCRS proteins retrieved from the BLAST database searches, including Methylbacillus flagellatus, Erythrobacter litoralis, Sphingopyxis alaskensis, Oligotropha carboxidovorans, Xylella fastidiosa, and Xanthomonas oryzae.

Phylogenetic Analyses

Major phylogenetic branching patterns generated by Bayesian and maximum likelihood methods demonstrated congruent topologies, particularly for major monophyletic groups. The phylogenetic reconstruction of heavy-metal efflux proteins revealed four major independent monophyletic clusters that corresponded to the CopA, CopB, CzcA, and CzcB proteins with generally high bootstrap values (99 % for CzcA, 67 % for CzcB, 99 % for CopA, and 91 % for CopB) for maximum likelihood inferences and high posterior probabilities (91 for CzcA, 78 for CzcB, 97 for CopA, and 93 for CopB) for Bayesian reconstructions (Fig. 1). Each of the four clades represents groups of orthologous genes that have differentiated as a result of species divergences. The phylogenetic analyses, however, did not provide confidence for the basal relationships among these groups, which clustered as a single polytomy (Fig. 1).

Fig. 1
figure 1

Maximum likelihood phylogenetic tree inferring evolutionary relationships among 174 complete CopA, CopB, CzcA, and CzcB protein sequences. Taxa are labeled with the first letter of the genus followed by the first three letters of the species names (see Supplementary Materials for Genbank accession numbers, protein identification and species names). CycZ and ScsB homologs were used to root the phylogeny. Potential instances of horizontal gene transfer are highlighted by boxed identifiers and thicker branches of their supporting clades. CutA2 and a ScsB with dual roles in metal tolerance and cytochrome synthesis, as well as the ferroxidase McoA, are in bold. Numbers indicate confidence in nodes based on 200 bootstrap replicates of the data

The metal tolerance proteins CutA2 from E. coli and ScsB from Salmonella typhimurium clustered with other CycZ/ScsB homolog proteins involved in cytochrome C biosynthesis, with bootstrap and posterior probability values of 99 % and 100, respectively (Fig. 1), reflecting the dual function of these proteins in cytochrome biosynthesis and heavy-metal tolerance. A multicopper ferroxidase (McoA) from Pseudomonas aeruginosa, which exhibits a role in iron acquisition but not copper tolerance (Huston et al. 2002), clustered with CopA homologs from other pseudomonads with bootstrap and posterior probability values of 93 % and 100, respectively.

Contrary to efflux proteins, the response regulators CopR and CzcR formed a single cluster with high bootstrap and posterior probability values of 91 % and 100, respectively (Fig. 2). These sequences clustered independently from the histidine kinase sensors CopS and CzcS, which formed a monophyletic group with high bootstrap support (91 %) and posterior probability (79). These clusters revealed, however, two large polytomies that prevented inferences regarding the basal branching of distinct response regulators and histidine kinases.

Fig. 2
figure 2

Maximum likelihood unrooted phylogenetic tree of 133 protein sequences from the CzcR, CzcS, CopR, and CopS protein families. Taxa are labeled with the first letter of the genus followed by the first three letters of the species names (see Supplementary Materials for Genbank accession numbers, protein identification, and species names). Potential instances of horizontal gene transfer are highlighted by boxed identifiers and thicker branches of their supporting clades. Protein sequences MtrA, RegX3, and PrrA (in bold) provided model templates for structural analyses. Numbers indicate confidence in nodes based on 200 bootstrap replicates of the data

The phylogenies reported within this study revealed several potential examples of horizontal gene transfer. Notably, maximum likelihood and Bayesian phylogenetic inferences demonstrated congruency for all examples of horizontal gene transfer events. Phylogenetic inferences based on the Cop and Czc efflux proteins revealed a minimum of nine potential cases of horizontal gene transfer, as the cluster of species with these protein sequences did not correspond to taxonomic relationships predicted by 16S rRNA phylogenies (Figs. 1, 2). For example, while both Achromobacter xylosoxidans and R. metallidurans belong to the class betaproteobacteria, CzcA and CzcB homologs from these bacteria grouped with orthologous protein sequences from the alphaproteobacteria Erythrobacter litoralis, Sphingopyxis alaskensis, and Novosphingobium aromaticivorans with high bootstrap and posterior probability values of 85 % and 90, respectively (Fig. 1). Similarly, CzcA and CzcB protein sequences from the gammaproteobacteria Azotobacter vinelandii grouped with the betaproteobacteria Methylibium petroleiphilum (with bootstrap and posterior probability values of 99 % and 91, respectively) rather than with other gammaproteobacteria from the genera Pseudomonas or Xanthomonas, which are evolutionarily more closely related. However, in this case, strong bootstrap support was only detected for the CzcA orthologs. The lower bootstrap support for the CzcB Methylibium petroleiphilum grouping (68 %) could be the result of this protein being originated from a more ancient later transfer event, as this group of sequences clustered independently from the main CzcB monophyletic group (Fig. 1). Finally, the CzcA homolog from the gammaproteobacteria Nitrosococcus oceani clustered with CzcA proteins from the betaproteobacteria Nitrosomonas eutropha and Nitrosomonas europaea (with bootstrap and posterior probability values of 63 % and 58, respectively) rather than with other gammaproteobacteria representatives, as it should be expected based on taxonomic clustering of 16S rDNA phylogenies.

Potential instances of horizontal gene transfer among copper tolerance homologs were detected with both CopA and CopB proteins from the gammaproteobacteria Pseudoalteromonas haloplanktis, Photobacterium profundum, and Aeromonas veronii, which grouped with the betaproteobacteria Ralstonia metalllidurans, Ralstonia solanacearum, Janthinobacterium sp., Herminiimonas arsenicoxydans, Delftia acidovorans, and Thiobacillus denitrificans, with bootstrap support values of 94 and 59 %, respectively (posterior probabilities of 88 and 55, respectively). Evolutionary relationships of bacterial taxa based upon 16S rDNA phylogenetic trees suggest that CopA and CopB proteins from P. profundum, P. haloplanktis, and A. veronii should have clustered with other gammaproteobacteria, such as E. coli, Klebsiella pneumoniae, and Serratia marcescens. These examples could potentially represent three independent evolutionary events of horizontal gene transfer. However, our phylogenetic analysis precludes evaluating this hypothesis, given that the protein sequences clustered together within a single monophyletic group (see Fig. 1).

The differential GC content between copA and the corresponding genomes of P. profundum, P. haloplanktis, and A. veronii was 15.8, 10.4, and 29.5 %, respectively (Table 1). These values were considerably greater than the mean differential GC content of other gammaproteobacteria. In fact, the differential GC content was greater than three standard deviations from the mean value of sampled gammaproteobacteria (3.0 ± 1.9). While CopB homologs grouped with a lower bootstrap value, the differential GC content between copB genes and the respective genomes of P. profundum, P. haloplanktis, and A. veronii were 10.9, 7.0, and 39.2 %, respectively (Table 1). Again, these values were considerably higher than the mean differential GC content of these genes and their host genomes in other gammaproteobateria. These results are consistent with the phylogenetic analyses, suggesting the lateral transfer of copA and copB homologs among these bacteria. Differential GC values for CzcA and CzcB putative cases of horizontal gene transfer were consistently lower than CopA and CopB reported cases. However, they were still at the higher end of the distribution of differential GC contents for species from their corresponding taxonomic groups (see Table 1), with values ranging from 5.8 to 48.8 differential GC percent. An exception was the cnrB gene from R. metallidurans, which only showed 2.2 % difference with its genomic GC content.

Table 1 Percent difference of GC content (ΔGC %) between putative cases of lateral gene transfer and the corresponding host genomes for nine efflux proteins and two regulatory proteins associated with heavy-metal resistance

Interestingly, a potential CopB homolog from the eukaryote Asian rice plant, Oryza sativa, grouped with bacterial CopB proteins (bootstrap and posterior probability values of 73 % and 67), clustering with the rice pathogen Xanthomonas oryzae (Fig. 1), while showing greatest sequence similarity with the human pathogen Stenotrophomonas maltophilia (89 % amino acid similarity throughout 321 amino acids). In addition, a CzcB homolog was found in the black cottonwood Populus trichocarpa, which demonstrated 99 % sequence similarity with CnrB of R. metallidurans (Liesgang et al. 1993) through 98 amino acids. The potential CzcB protein of P. trichocarpa grouped with a clade of bacterial CzcB homologs with bootstrap and posterior probability values of 93 % and 92. Genes for the efflux proteins from O. sativa and P. trichocarpa also demonstrated high differential GC contents compared to their host genomes (25.7 and 48.8 %, respectively; Table 1), as expected with lateral transfer events.

With regards to the analysis of regulatory protein sequences, the phylogenetic tree showed a significant number of unresolved clusters, with two major basal polytomies corresponding to response regulators and sensory kinases. However, greater resolution within smaller clades provided evidence for two potential events of lateral gene transfer, each event occurring across major phylogenetic domains. First, a potential CzcR protein from the black cottonwood tree, Populus trichocarpa grouped with 91 % bootstrap support (posterior probability value of 92) with the response regulator ortholog from R. metallidurans (Fig. 2). Sequence comparison revealed that these proteins shared a 100 % amino acid sequence similarity over 228 amino acids. The P. trichocarpa sequence also showed 100 % amino acid similarity with a potential CopR homolog of R. metallidurans (GenBank accession YP145683). A second grouping occurred between the eukaryote protozoan Trypanosoma congolense and the soil bacteria Pseudomonas stutzeri (GenBank accession ABP81018), which clustered with a 97 % bootstrap confidence (posterior probability value of 92) and showed 84 % identity over 225 amino acids. The genes corresponding to the response regulators from P. trichocarpa and T. congolense revealed 56.1 and 9.2 % differential GC content with their host genomes, respectively. These values are considerably higher (a minimum of three standard deviations) than the average differential GC content (2.3 %, s.d. = 2.3) estimated for other response regulators (Table 1).

3D Analyses of Protein Conformation

The amino acid sequences of the CzcR protein from R. metallidurans and the CopR protein from P. syringae were separately used as search queries to obtain 3D coordinate sets of related protein structures from the SWISS-MODEL repository (Kopp and Schwede 2004) of annotated protein structure templates. Searches with CopR and CzcR response regulators within the SWISS-MODEL repository revealed the highest structural similarity to the crystal structures of the DrrD protein (PDB ID 3NNN), a functionally uncharacterized response regulator from T. maritima (Barbieri et al. 2010), followed by RegX3 (PDB ID 2OQR; King-Scott et al. 2007) and MtrA (PDB ID 2GWR; Friedland et al. 2007) from M. tuberculosis. The crystal structures of these top database hits were then used as modeling templates to predict the 3D structures of CopR and CzcR, generated using the SWISS-MODEL First Approach Mode automated server (Guex and Peitsch 1997) (Fig. 3). Our phylogenetic analysis had shown that the response regulators RegX3 and MtrA from M. tuberculosis formed a monophyletic group with CzcR proteins from alphaproteobacteria, including Bradyrhizobium japonicum and Nitrococcus winogradsky (see Fig. 2).

Fig. 3
figure 3

Predicted CopR and CzcR 3D structures based on a variety of known crystal structures used as model templates, including the DrrD response regulator from T. maritima, as well as the RegX3 and MtrA response regulators from M. tuberculosis. Predicted alpha-helix and beta sheets secondary structures are shown in red and yellow, respectively, whereas loop regions are represented in light gray and blue. a CopR + DrrD (RMS = 0.68 Å), b CzcR + DrrD (RMS = 0.68 Å), c CopR + RegX3 (RMS = 1.34 Å), d CzcR + RegX3 (RMS = 1.34 Å), e CopR + MtrA (RMS = 0.91 Å), f CzcR + MtrA (RMS = 0.61 Å)

Despite having high levels of sequence similarity with CopR and CzcR, several response regulators were not able to provide an adequate model template for predicted protein structures due to conformational dissimilarities, as determined within Deep View Swiss-PDB Viewer. These proteins included the response regulators PrrA (PDB ID 1YS6), involved in the pathogenicity of M. tuberculosis (Nowak et al. 2006; Fig. 2), and the antibiotic tolerance response regulator DrrB (PDB ID 1P2F; Gandlur et al. 2004).

Following the structural alignment of each target and template protein, RMS deviations were calculated to compare the predicted 3D structural similarity of both CopR and CzcR to each of the protein models. CopR and CzcR each demonstrated identical amounts of structural variability with the model templates DrrD (0.68 Å) and RegX3 (1.34 Å) (Fig. 3). However, CopR and CzcR each demonstrated differential amounts of structural variability with the model template MtrA (0.91 and 0.61 Å, respectively). Predicted secondary structures were conserved among all model templates and target protein structures (Fig. 3), with areas exhibiting alpha-helix and beta-sheet secondary structures having a more conserved tertiary structure than loop regions. In all cases, the linker region between the two domains of the response regulators demonstrated the lowest region of structural similarity.

Discussion

This study examined orthologous and paralogous relationships among TCRS and corresponding efflux proteins involved in heavy-metal resistance in bacteria. While previous reports have noted significant sequence similarities among these proteins (Ouzounis and Sander 1991), the reconstruction of phylogenetic trees provided an appropriate evolutionary framework necessary to better understand the origin and evolutionary history of heavy-metal resistance genes.

Phylogenetic analyses of heavy-metal efflux proteins revealed four distinct clades, which represented clusters of orthologous genes associated with CopA, CopB, CzcA, and CzcB proteins. These monophyletic groups were supported by high bootstrap values and relatively high posterior probability values for maximum likelihood and Bayesian reconstructions, respectively (Fig. 1). Considering the similar roles of CopA and CopB in copper tolerance, as well as their shared copper-binding motifs (Mellano and Cooksey 1988), these genes may have originated from the duplication of an ancestral cop gene, and thus could be considered paralogous. Similarly, the functional associations of CzcA and CzcB proteins, both of which work together as a translocase for the transportation of metal elements through the cell membrane (Goldberg et al. 1999) suggest a common evolutionary origin and similar paralogous relationships. However, the phylogenetic analyses did not provide strong support for the clustering of these basal groups, which suggest that these ancestral gene duplications represent ancient evolutionary events.

In multiple occasions, BLAST searches revealed that several taxa in which cop and czc efflux pumps were present had no apparent cop or czc TCRS proteins within the NCBI database. Previous studies of heavy-metal resistance systems have reported diverse regulatory mechanisms in both cop (Cooksey and Azad 1992) and czc operons (Diels et al. 1995; Legatzki et al. 2003; Nies 2004b). For example, the cop operon of the citrus pathogen Xanthomonas axonopodis is regulated by copL (Voloudakis et al. 2005), which is not part of a TCRS and has potential homologs present in the genomes of Xanthomonas campestris and Xyllela fastidiosa, both infectious agents of multiple plant species. In addition, the cnrAB and nccAB efflux proteins of the human pathogen A. xylosoxidans (Schmidt and Schlegel 1994) and R. metallidurans (Grass et al. 2000) are not regulated by TCRS proteins and demonstrate metal substrate specificities distinct from other czc efflux proteins.

Phylogenetic inferences of efflux and regulatory proteins suggest that while the vertical transfer of heavy-metal resistance genes has been the predominant mode of transmission, horizontal gene transfer has also played an important role in the spread of these genes. Phylogenetic trees revealed evidence for a minimum of nine potential cases of lateral gene transfer associated with efflux proteins and two apparent horizontal gene transfer event associated with regulatory proteins from heavy-metal resistance genes. Evaluation of GC content in these genes and their corresponding host genomes often revealed considerable differences, consistent with the horizontal gene transfer events detected with the phylogenetic analyses.

Horizontal gene transfer provides bacterial communities the potential to readily adapt to selective pressures through the lateral spread of genes (Ochman et al. 2000). Interestingly, CnrB and NccB proteins from betaproteobacteria, which have distinct substrate ranges and are not regulated by TCRS, clustered phylogenetically with proteins from alphaproteobacteria. This suggests a possible role of horizontal gene transfer in genomic innovation (Ochman et al. 2000), since the acquisition of new genes through lateral transfer may have promoted the metabolic assimilation of novel substrates. The greater number of horizontal gene transfer events detected among efflux proteins compared to regulatory proteins may be associated with the fact that regulatory genes are often integrated into larger gene networks (Jain et al. 1999), which would make more difficult for recipient bacteria to assimilate these genes into their genomes. However, the number of horizontal gene transfer events reported here should be interpreted with caution, as phylogenetic analyses may underestimate the role of horizontal gene transfer among closely related species (Ochman and Moran 2001) and differential percent of GC content may be explained by phenomena other than lateral gene transfer, including selection, mutational bias, and the direction of transcription relative to the origin of replication (Eisen 2000; Lafay et al. 1999).

Our analyses revealed four potential instances of gene transfer between prokaryotes and eukaryotes. While controversial, other studies have suggested the occurrence of lateral gene transfer between multicellular eukaryotes and bacteria (Hotopp et al. 2007; Nikoh and Nakabachi 2009). While independent convergent evolution of metal tolerance in bacteria and eukarya may provide an alternative explanation, our phylogenetic analysis suggests horizontal transfer of bacterial heavy-metal homologs with the plants Populus trichocarpa and Oryza sativa, as well as the protozoan Trypanosoma congolense. These phylogenetic inferences of interdomain horizontal gene transfer were confirmed by the large difference in GC content between the genes and their host genomes (Table 1). Furthermore, the bacteria involved in the potential transfer of this gene are common soil microbes observed in metal-contaminated soils (Cha and Cooksey 1991; Cooksey and Azad 1992; Liesgang et al. 1993; Legatzki et al. 2003) as part of the microbial community intimately associated with these plants and protozoan.

The lateral transfer of heavy-metal tolerance genes among distantly related community constituents that may either compete for resources or form commensal relationships provides the bases for community-level selection for metal tolerance. The idea that selective processes may act at the community level, while controversial within the field of evolutionary biology (Sober and Wilson 1998), has been suggested in a variety of laboratory experiments (Goodnight 1990; Cullen and Neale 1994; Swenson et al. 2000a, b; Muller et al. 2001; Sun and Friedmann 2005). Microbial communities often dwell within biofilm assemblages composed of metabolically interdependent taxa. With the increased potential for horizontal transfer of these plasmid-borne genes, these microbial assemblages provide an apt model for the potential evolution of community-level adaptations (Swenson et al. 2000a; Hoostal and Bouzat 2008; Hoostal et al. 2008).

The 3D structure analyses of response regulators conducted in this study provide insights into the relationships between phylogenetic divergence, structural conservation, and the origin of new functional phenotypes. CopR and CzcR regulatory proteins demonstrate functional divergence of homologous sequences with high levels of structural similarity. This evolutionary model may be particularly relevant for TCRS proteins, whose similar 3D structure can be associated to their overall function in the regulation of cellular transport or sequestration of metals, but which may require functional innovation to work efficiently on different molecular targets (e.g., on distinct metal elements). Predictions of 3D conformations for CopR and CzcR provide additional insights into the mechanism of action for these response regulators. For example, an extensive interface between the two domains of the response regulator DrrB suggests that interactions between regulatory and DNA-binding domains are significant during TCRS signal transduction (Gandlur et al. 2004). In contrast, a much smaller interface and a high amount of disorder within the linker region of DrrD suggests that the two domains have no fixed orientation and that interdomain regulation may not be as significant during phosphorelay (Buckler et al. 2002). The high amount of structural similarities found between heavy metal response regulators and DrrD, but not DrrB, suggests that heavy-metal regulator proteins have similar interdomain flexibility, which may allow broad regulatory roles within complex regulatory networks (Perron et al. 2004; Caille et al. 2007).

The structural congruence of heavy metal response regulators with DrrD and RegX3, but not MtrA, may also provide insight into the regulatory mechanism of CopR and CzcR. DrrD and RegX3 exhibit conformations that facilitate expedient regulation of target genes upon phosphorylation (Buckler et al. 2002; King-Scott et al. 2007). In contrast, MtrA demonstrates a structure that less readily induces target genes (Friedland et al. 2007). Our results were consistent with the broader regulatory role of CopR compared to CzcR within regulatory networks associated with heavy-metal resistance (Caille et al. 2007).

Significant differences in structural conformation of CopR and CzcR with more distantly related response regulators such as DrrB and PrrA, emphasize the potential role of natural selection in the functional diversification of these regulatory proteins. High levels of sequence similarity among functionally disparate response regulators (Gao et al. 2007) suggest that a small number of mutations may have significant effects on protein structure, with potential consequent effects on their function, particularly if changes occur in the linker region of response regulators. Patterns of structural similarity between response regulators may, therefore, be explained by functional constraints associated with the mechanisms of heavy-metal regulation.

In this study, we used multiple lines of evidence, including phylogenetic reconstructions, levels of GC content, and 3D protein structure models to infer evolutionary relationships among both structural and regulatory proteins associated with heavy-metal resistance. Phylogenetic inferences and GC content analyses showed independent evidence of multiple horizontal transfer events over the evolutionary history of these gene families, thus providing greater confidence in the occurrence of these events. Our study cautions making inferences based solely on sequence or structural similarity, as processes such as selection, mutational bias and direction of replication may lead to incorrect assumptions regarding past evolutionary relationships (Eisen 2000; Lafay et al. 1999). The structural similarity of copper, cadmium, zinc, and cobalt response regulators with other TCRS proteins also suggested a common evolutionary origin of functional phenotypes and similar mechanisms of action in signal transduction. This study demonstrates the value of integrating phylogenetic analyses with 3D structural analyses of proteins to better understand the evolutionary origin and functional diversification of gene families.