Introduction

The Hsp60 and Hsp70 are two of the major and best-studied molecular chaperone proteins, whose homologs are present in virtually all organisms (Craig et al. 1993; Bukau and Horwich 1998; Gupta 1995; Gupta and Golding 1993). These proteins play essential cellular roles in the intracellular protein folding and translocation processes (Craig et al. 1993; Bukau and Horwich 1998; Ellis and Hartl 1999). Our earlier work has identified a number of conserved inserts in highly preserved regions of these proteins that are uniquely found in species from particular groups or phyla of bacteria (Gupta and Golding 1993; Gupta 1998, 2000). The species distribution pattern of these conserved indels (i.e., inserts or deletions), as well as numerous other conserved indels in widely distributed proteins that have been discovered in recent years, has provided important insights concerning the evolutionary relationships among different organisms (Rivera and Lake 1992; Gupta and Golding 1993; Baldauf and Palmer 1993; Rokas and Holland 2000; Gupta 1998). The discovery of these lineage-specific conserved indels in ubiquitous proteins raises the question concerning their cellular functions. Because the primary cellular functions of these proteins where such indels are present (e.g., Hsp60, Hsp70, EF-Tu, RpoB, RpoC, etc.) are expected to be the same in all species, it is important to determine whether these indels represent functionally important genetic events that are essential for the groups of organisms where they are found. The answer to this question is of much interest and importance and it has not been carefully studied in the past.

In this work, we provide updated information regarding the species distribution patterns of several conserved indels in the Hsp60 and Hsp70 proteins and also describe some new conserved indels in these proteins. More importantly, this work examines the requirement for cellular growth of a number of conserved indels in these proteins in E. coli cells. The bacterial homologs of Hsp60 and Hsp70 are known as GroEL and DnaK, respectively and they are essential for the growth of E. coli cells. Of these, GroEL is essential at all temperatures, whereas mutation or deletion of DnaK permits limited growth below 34°C (Ang and Georgopoulos 1989; Fayet et al. 1989; Bukau and Walker 1989; Wild et al. 1992; Zeilstra-Ryalls et al. 1991). We describe here the results of experiments where the ability of different Hsp60 and Hsp70 constructs (i.e., + or – inserts or containing various modifications in the insert regions) to complement the T s phenotype of E. coli GroEL and DnaK mutants was examined (Ang and Georgopoulos 1989; Fayet et al. 1989). Results of our studies provide strong evidence that all of the conserved inserts in these proteins are essential for the growth of E. coli cells. The surface locations of these inserts in loop regions also suggest that they could be involved in interaction with other proteins and ligands.

Experimental procedures

Bacterial strains

The wild-type and mutant strains of E. coli that are temperature-sensitive (T s) for growth at 42°C due to mutations in the groEL and dnaK genes have been previously characterized by Dr. Costa Georgopoulos and coworkers (Georgopoulos et al. 1973; Ang and Georgopoulos 1989; Klein and Georgopoulos 2001). The following wild-type and mutant strains were kindly made available to us by Drs. Debbie Ang and C. Georgopoulos (University of Geneva, Switzerland). The wild-type strains used were CG3014 (a derivative of B178) and CG799 (a derivative of C600) that grow normally at both 30° and 42°C. The strain CG3015 (B178 groEL673 Tn10 tetR nearby) is a mutant of CG3014, which contains two different point mutations in the groEL gene (G173D and G337D) that makes it unable to grow at 42°C (Georgopoulos et al. 1973; Klein and Georgopoulos 2001). The complementation studies with dnaK (Hsp70) were carried out using the mutant strain CG800 (C600 dnaK103 thr::Tn10), which is T s for growth at 42°C (Ang and Georgopoulos 1989).

Construction of plasmids and creation of site-specific mutations

Full-length wild type groEL and dnaK genes were PCR amplified from E. coli DNA based on the published sequences for these genes/proteins (GroEL accession number AAS75782, DnaK accession number YP_851220). The amplified sequences were cloned in the pDrive vector and were sequenced to ensure that they contained no mutations. Drs. Lori Burrows and Dr. Turlough Finan of McMaster University generously provided the genomic DNA for Pseudomonas aerguinosa and Sinorhizobium meliloti to us. The cDNA for human and Chinese hamster Hsp60 have been cloned in our earlier work (Jindal et al. 1989; Picketts et al. 1989). For expression, the groEL and dnaK sequences were PCR amplified from the plasmid vector using a second set of primers containing the appropriate restriction sites and subcloned in the plasmid PKK-233 (Pharmacia). The list of different DNA primers that were used to amplify various genes or to create various site-specific changes is provided in the Supplemental Table 1. Specific changes in the groEL and dnaK genes were introduced by means of PCR using overlapping primers in opposite orientations carrying the appropriate changes employing the ‘Quikchange’ site-directed mutagenesis kit (Stratagene). The presence of the desired changes in the mutant plasmids (and absence of any other changes) was confirmed by DNA sequencing.

Complementation studies with the temperature-sensitive mutants

All bacterial strains were verified for T s phenotypes by examining their growth at 30° and 42°C by streaking on the LB agar plates. The E. coli strains CG3014 and CG799 showed normal growth at both at both 30° and 42°C. In contrast, the mutant strains CG3015 (T s groEL) and CG800 (T s dnaK) showed no growth at 42°C, although they grew normally at 30°C. The competent cells for these E. coli strains were prepared by the CaCl2 method. The CG3015 (or CG800) mutant cells were transformed with various PKK-233 plasmids harboring different groEL (or dnaK) gene constructs to test for complementation as measured by the ability of the cells to form colonies at 42°C. Our studies with the wild type groEL and dnaK constructs indicated that IPTG addition was not necessary in order to observe complementation of the T s phenotypes, hence IPTG was not included in such experiments. In a typical experiment, the CG3015 (or CG800) cells were transformed with about 50–200 ng of the plasmid DNA for either the wild type groEL (dnaK) or various other test construct(s). After incubating for 1 h at 30°C in a shaking water bath, different dilutions of the cultures were plated on LB agar plates containing 50 µg/ml ampicillin. The cell dilutions were generally made to obtain between 100 and 10,000 colonies on the plates grown at 30°C with the wild type construct. All dilutions were plated in at least quadruplicate and two or more plates from each dilution were incubated at 30° and 42°C. The average number of colonies observed at 30° and 42°C upon transformation with different constructs was determined (based upon dilutions which yielded well-separated colonies) and the number of colonies observed at 42°C was normalized with respect to those seen at 30°C. All of these experiments were repeated at least three times to ensure reproducibility.

To examine the expression of the recombinant proteins in the mutant cells, CG3015 (or CG800) cells transformed with various plasmids were induced with 0.1 mmol IPTG for 2.5 h at 37°C and the cell extracts were analyzed by electrophoresis in 10% SDS-polyacrylamide gels. The cells transformed with PKK-233 plasmid containing no cloned gene was used as a control in these experiments. The band corresponding to the recombinant GroEL was readily detected in stained gels due to its high level of expression in comparison to the control cells. However, to detect the band corresponding to DnaK, it was necessary to carry out western blot analysis using an antibody to the Hsp70/DnaK protein. The rabbit polyclonal antibody to Hsp70/DnaK used in these studies (Ahmad et al. 1990) shows good reactivity against the E. coli DnaK protein and it was employed at a dilution of 1:2,000 in these experiments.

Blast searches

The presence or absence of various inserts in the Hsp60 and Hsp70 proteins in different groups was determined by Blast searches on the indicated sequence segments of these proteins against various groups of bacteria. Such searches were performed with both an insert containing (E. coli) as well as an insert lacking homolog (Bacillus subtilis) and the presence or absence of various inserts in species from different groups of bacteria was determined.

Results

Species distribution and essential nature of a conserved insert in the Hsp60 protein

Figure 1 presents partial sequence alignment of the Hsp60 protein showing a one aa insert (boxed) in a conserved region that is specific for most Gram-negative bacteria, but not found in other phyla of bacteria including various Gram-positive bacteria (Gupta 1998, 2000). When this signature was originally described, sequence information for Hsp60 was available from only a limited number of bacteria. To examine the specificity of this insert, Blastp searches were carried out on the Hsp60 sequence segments shown in Fig. 1 against various taxonomic groups and the presence or absence of this insert in species from different bacterial groups was determined. The results of these analyses are also included in Fig. 1. For example, for the γ- and β-proteobacteria, 611 and 139 entries, respectively, were found in the NCBI database and all of them contained this insert. Similarly, for other phyla of Gram-negative bacteria (viz. α-, δ- and ε- proteobacteria, Aquificae, Chlamydiae–Verrucomicrobiae–Planctomyctes, Bacteroidetes–Chlorobi–Fibrobacter, Spirochaetes and Cyanobacteria), about 900 sequences were available and all of them contained this insert. In some bacteria, particularly the Rhizobiaceae and Bradyrhizobiaceae, multiple homologs of Hsp60 are found (Rodriguez-Quinones et al. 2005; Rusanganwa and Gupta 1993) and all of them contained this insert. In contrast to these bacterial groups, this insert was not present in nearly all of the sequences from Chloroflexi, Deinococcus–Thermus, Thermotogae, Fusobacteria, Actinobacteria and Firmicutes phyla. Of the >1,400 sequences from these groups, this insert was present in only ten entries (mostly from Firmicutes), which constituted <1% of the total sequences. These small number of exceptions are likely the results of lateral gene transfers or some other non-specific events (Gogarten et al. 2002; Doolittle 1999).

Fig. 1
figure 1

Partial sequence alignment of the Hsp60 protein showing a one aa insert (boxed) in a conserved region that is specific for certain major phyla of bacteria and the eukaryotic homologs, but not found in other bacteria. The presence or absence of this insert in all available sequences from different bacterial groups is indicated along with their names. For example, for γ-proteobacteria a total of 611 hits corresponding to Hsp60 were observed and all of them contained this insert (611 with insert, 0 without insert). Similarly, for the Firmicutes phylum, a total of 778 hits were observed and of these only seven contained the insert (7/771). Only representative sequences from different major groups are shown here. The dashes in the alignment indicate that the same amino acid as that found on the top line (i.e., E. coli protein) is present in that position. The accession numbers of sequences are given in the second column. The numbers on the top indicate the position of this sequence in E. coli protein

In eukaryotic organisms, Hsp60 homologs are only found in organelles such as mitochondria and plastids (Gupta 1995, 2000), which have originated from bacterial ancestors belonging to the α-proteobacteria and Cyanobacteria groups, respectively (Margulis 1993; Gray 1999; Palmer and Delwiche 1998). Similar to various bacteria from these groups, all of the mitochondrial and plastids homologs of Hsp60 also contained this insert in the same position. Further, similar to the proteobacteria, the insert in various mitochondrial homologs consisted of an N (Asn) residue, whereas in all of the plastid homologs a G (Gly) was present in this position. These results provide further evidence for the origin of plastids from a cyanobacterial ancestor (Palmer and Delwiche 1998; Gupta et al. 2003).

The species distribution pattern of the Hsp60 insert indicates that it is a stable and highly reliable characteristic of the above groups of Gram-negative bacteria. Most likely explanation for this result is that the rare genetic change (RGC) responsible for this insert was introduced in a common ancestor of the above groups of bacteria, as indicated in Fig. 2. The retention of this insert by all species from the indicated groups of bacteria without any loss (e.g., all 1,385 proteobacterial homologs contain this insert) provides persuasive evidence that there is a strong selection pressure for the retention of this insert in these bacteria. In the Hsp60 structure (Braig et al. 1994), the N153 residue, corresponding to the insert, is present on the surface of this protein in a loop connecting two alpha helices (Supplemental Figure 1). In this figure, only three of the seven subunits of the GroEL ring structure are shown. The insert is shown in purple with its side chain protruding. Earlier studies indicate that mutations in a number of neighboring conserved residues viz. I150 → E, S151 → V and A152 → E, affect both ATPase activity and binding of the co-chaperone GroES by the mutant proteins (Fenton et al. 1994).

Fig. 2
figure 2

A diagram showing the species distribution patterns of various conserved inserts in the GroEL and DnaK proteins (shown in Figs. 1, 4 and 6) and the evolutionary stages where genetic changes responsible for these inserts likely occurred (Gupta 2000; Griffiths and Gupta 2004). The different inserts shown in this diagram are generally present in all of the species from various bacterial groups that lie above the horizontal lines marking the positions of these inserts, but they are absent from most other groups that fall below these lines

The importance of the Hsp60 insert for cellular growth was examined by employing a T s mutant of E. coli (CG3015, groEL673) (Georgopoulos et al. 1973; Klein and Georgopoulos 2001). Transformation of this mutant with an expression plasmid containing the wild-type E. coli groEL gene led to restoration of normal growth at 42°C (Table 1). When the one aa insert (N153) from the E. coli GroEL was deleted, cells transformed with the (–) insert plasmid showed no growth at 42°C (Table 1) indicating that the protein lacking this insert was unable to meet the GroEL requirement for cellular growth. To understand the significance of the amino acid found in this position, a number of mutants were made in which the N153 was replaced with other amino acids [viz. Glycine (G), Alanine (A), Aspartic acid (D), Glutamine (Q) and Valine (V)]. The results of studies with these mutants are also presented in Table 1. Of these mutants, very weak complementation (0.3–4.0% plating efficiency) was observed with the N153 → D, N153 → Q and N153 → A mutants, while the other two mutations, N153 → G and N153 → V, were totally inactive in complementing the T s defect. In the first three of these mutants showing weak complementation, the asparagine residue in the insert position was replaced with either a conservative (D or Q) or similar size amino acids, whereas in the other two cases the replacements were more drastic. These results indicated that not only that the insert in this position, but the nature of the amino acid present at this position is also crucial for the proper function of the protein. As noted earlier, in contrast to other bacteria, all cyanobacteria have a glycine (G) residue in the insert position. However, a N → G substitution is not compatible with the growth of E. coli, indicating that this change to Gly is specific for cyanobacteria and plastids.

Table 1 Complementation of the E. coli GroEL T s mutant with different constructs

We have also examined the expression of the mutant proteins in the CG3015 (or CG800) to determine whether the lack of complementation in some cases could be caused by this factor. However, there was good and comparable level of expression of the mutant proteins (GroEL or DnaK) in all cases (Fig. 3), indicating that the observed lack of complementation for many mutants, or proteins from other bacteria, that were studied in this work was not due to their lack of expression.

Fig. 3
figure 3

Expression of the recombinant GroEL/Hsp60 or DnaK proteins in the mutant cells. a SDS-PAGE analysis of expression of the recombinant GroEL proteins in E. coli CG3015 cells grown at 37°C. The cells transformed with various plasmids (see Table 1) were induced with 0.1 mmol IPTG for 2.5 h and then analyzed. Left lane MWM molecular weight markers, lane 1 cells transformed with a plasmid containing the wild-type E. coli GroEL, lane 2 E. coli GroEL with deletion of N153, lane 3 E. coli GroEL with N153D mutation, lane 4 E. coli GroEL with N153V mutation, lane 5 E. coli GroEL with N153Q mutation, lane 6 control E. coli cells transformed with empty plasmid, lane 7 plasmid with B. subtilis GroEL, lane 8 plasmid containing full-length Hsp60 cDNA from human cells. Cells transformed with other plasmids not shown here also showed comparable level of protein expression. b Western blot analysis showing the expression at 37°C of DnaK in E. coli CG800 cells transformed with various plasmids (Table 2). Lane 1 cells transformed with empty vector, lane 2 plasmid with wild-type DNA from E. coli, lane 3 E. coli DnaK with ∆75–97, lane 4 E. coli DnaK with ∆EE deletion from positions 80–81, lane 5 E. coli DnaK with the four aa deletion of DEVD from positions 208–211, lane 6 E. coli DnaK with ∆E209, lane 7 E. coli DnaK with two aa addition (SG) after position 385, lane 8 DnaK from P. aeruginosa, lane 9 DnaK from S. meliloti

We have also examined whether Hsp60 from other organisms can complement the T s defect of the E. coli groEL mutant. To test this, full-length groEL genes from a number of bacteria viz. P. aeruginosa (a γ-proteobacteria), S. meliloti (an α-proteobacteria) and B. subtilis (a low G + C Gram-positive bacterium) were cloned in the expression vector and their ability to complement the T s phenotype of E. coli CG3015 was examined. All of these proteins showed comparable level of expression in E. coli cells (Fig. 3a). However, of these, only the GroEL homologs from P. aeruginosa and S. meliloti, which contained the Hsp60 insert and exhibited 79 and 68% sequence identity, respectively, to the E. coli protein, were able to complement the T s defect of the mutant (about 50% colony forming efficiency at 42°C; Table 1). In contrast, the GroEL homolog from B. subtilis, which lacked the Hsp60 insert was unable to complement the T s defect. This result is consistent with the requirement of the Hsp60 insert for its proper functioning in E. coli. However, the inability of this latter protein to function in E. coli can also be due to other changes in the molecule as the B. subtilis Hsp60 exhibits only 61% sequence identity to the E. coli protein.

We also studied whether the human and Chinese hamster Hsp60 (i.e., mitochondrial homologs) can complement the T s defect of the mutant. These mitochondrial proteins are derived from an α-proteobacterial ancestor and they contain the conserved insert in the Hsp60 protein. However, these proteins exhibit only 50% sequence identity to the E. coli homolog (Jindal et al. 1989; Picketts et al. 1989), indicating that they have diverged significantly from their bacterial ancestor. The results of our studies revealed that these proteins, despite containing the conserved insert and showing good expression (Fig. 3a), were unable to complement the T s defect of the mutant (Table 1). Richardson et al. (2001) have previously shown that the mammalian Hsp60 functions specifically with its own co-chaperonin and it does not function with the bacterial GroES protein. The inability of the mammalian Hsp60 to complement the T s defect of E. coli mutant is thus very likely due to this reason.

Description and studies on the conserved inserts in the Hsp70 (DnaK) protein

The Hsp70 family of proteins comprise one of the most conserved proteins known that are ubiquitously found in species from all three domains, with the exception of a few archaebacteria (Gupta and Golding 1993; Gupta 1998; Macario et al. 1999). A number of conserved indels have been described in this protein that are of considerable evolutionary significance. Figure 4 shows partial sequence alignment of the Hsp70 protein from some representative bacteria highlighting a 21–23 aa insert (boxed) that is a distinctive characteristic of various Gram-negative bacteria that are bounded with two membranes (i.e., diderm prokaryotes) (Gupta and Golding 1993; Gupta 1998; Lake et al. 2007). When blast searches were carried out with the segment of the DnaK protein shown in Fig. 4, all of the blast hits (>800) from various phyla of diderm bacteria contained this insert. In some proteobacteria, as a result of gene duplications, up to three homologs of Hsp70 are found (i.e., DnaK, HscA and HscC) (Vickery and Cupp-Vickery 2007; Arifuzzaman et al. 2004) and this insert was present in all of them (results not shown). In contrast, this insert was not found in any of the blast hits from the Firmicutes and Thermotogae phyla and most of the Actinobacteria. However, in a small number of Actinobacteria (e.g., Frankia, Arthrobacter, Streptomyces avermitilis, Rhodococcus, Actinomadura) two different Hsp70 homologs are present and of these only one contained this insert. Because most other Actinobacteria have only a single copy of the Hsp70 gene, it is likely that these soil Actinobacteria have acquired the additional Hsp70 gene from some soil-resident Gram-negative bacteria by lateral gene transfer (Gogarten et al. 2002; Doolittle 1999). Evidence has been presented that this large indel, which is also lacking in various archaebacteria, is an insert in the Gram-negative bacteria (see Fig. 2), thereby indicating that the prokaryotic phyla lacking this indel are ancestral (Gupta and Golding 1993; Gupta 1998; Gupta and Singh 1994; Lake et al. 2007). The presence of this large insert in all of the eukaryotic Hsp70 homologs also provides strong evidence for their origin from an insert-containing Gram-negative bacterium (Gupta et al. 1994; Gupta 1998). Within this large insert, a two aa insert that is specific for various proteobacteria has also been identified (smaller box in Fig. 4) (Gupta 1998).

Fig. 4
figure 4

Partial sequence alignment of the Hsp70/DnaK protein showing a large insert (boxed) that is specific for various Gram-negative bacteria (i.e., diderm bacteria bounded by two membranes) and the eukaryotic homologs, but not found in other bacteria. The numbers with the group’s name indicate the presence or absence of this insert in various blast hits from each group (see Fig. 1 and text for details). The distribution (presence/absence) of this insert in various subgroups of Gram-negative bacteria was as follows: γ proteobacteria 213/0; β proteobacteria, 120/0; α proteobacteria, 147/0; δ–ε proteobacteria, 116/0; Chlamydiae–Verrucomicrobiae, 10/0; Bacteriodetes–Chlorobi, 54/0; Cyanobacteria, 121/0; Aquificae, Spirochetes, Chloroflexi, Deinococcus–Thermus, 45/0; All of the sequences shown here are for the DnaK homologs, however, this insert is also present in other Hsp70 homologs (viz. HscA or HscC) that are present in some bacteria (mainly proteobacteria). This insert is also not found in archaeal homologs (0/40), whose sequences are not shown (Gupta and Golding 1993; Gupta 1998; Lake et al. 2007). The smaller box inside the larger box indicates a two aa insert that is specific for proteobacteria and not found in other Gram-negative bacteria (Gupta 1998, 2000)

The structure of the Hsp70/DnaK protein has been solved from a number of sources including bovine (Flaherty et al. 1990), E. coli (Harrison et al. 1997; Pellecchia et al. 2000) as well as a low G + C Gram-positive bacterium Geobacillus kaustophilus (Chang, Y.-W., Protein Data Bank I.D. No. 2v7y). This insert is located in the N-terminal ATPase domain of the protein (aa 1–383) and the structures of this domain for the E. coli and G. kaustophilus proteins are shown in Fig. 5. This large insert is present on the surface of the E. coli protein as an additional alpha helix inserted within a loop. The importance of this indel was examined by complementation studies using the DnaK T s mutant CG800, which shows no growth at 42°C (Ang and Georgopoulos 1989). The transformation of this mutant with the WT copy of the dnaK gene led to normal growth at 42°C (Table 2). However, when the large insert that is characteristics of the Gram-negative bacteria was deleted from the DnaK protein (∆75–97) the resulting protein despite showing good expression (Fig. 3b) showed no complementation of the dnaK T s mutant (Table 2). Previously, another deletion from this region (∆74–96) of the DnaK protein was also found to lead to loss of its biological activity (Sugimoto et al. 2007). We also made a two aa deletion (∆EE80–81) within this large insert and this plasmid was also completely inactive in complementing the T s phenotype of the mutant (Table 2).

Fig. 5
figure 5

Structure of the ATPase domain (aa 1–383) of DnaK from E. coli (left panel) and Geobacillus kaustophilus (right panel) showing the positions of different conserved inserts (described in Figs. 4, 6) in the E. coli protein. The images were constructed by means of PyMol (version 0.93) using the PDB file 1DKG for the E. coli protein and 2V7Y for G. kaustophilus homolog. The large insert specific for the Gram-negative bacteria is shown in yellow and it is present as an extra alpha helix. The two aa insert within the large insert that is specific for the proteobacteria is shown in blue. The position of the four aa β–γ proteobacteria insert in the E. coli protein is shown in pink. However, the structure of this region, which forms part of a loop is not resolved in the E. coli protein structure (Harrison et al. 1997). The colored arrows mark the positions of these inserts in the G. kaustophilus protein, which lacks these inserts. Note: the colors can be seen only on the online version of the article

Table 2 Complementation of the E. coli DnaK T s mutant with different constructs

The Hsp70 protein contains another highly conserved insert that is specific for the β–γ proteobacteria (Fig. 6). This four aa insert is present in all of the DnaK sequences (>300) from β–γ proteobacteria, but it is not found in any other Gram-negative bacteria. This insert is also not present in the HscA or HscC homologs from β–γ proteobacteria (results not shown), indicating that it is a distinctive characteristic of the DnaK homologs from these bacteria (see Fig. 2). Within Gram-positive bacteria this insert is absent in virtually all Firmcutes (only one exception in 209 entries) and all Actinobacteria except those belonging to the orders Micrococcineae and Bifidobacteria, which form a distinct clade within Actinobacteria (Gao and Gupta 2005; Embley and Stackebrandt 1993; Ventura et al. 2007). All of the DnaK homologs from these two orders of Actinobacteria contain either a four or a five aa insert (mostly five aa) in the same position (Fig. 6), where the four aa insert is found in β–γ proteobacteria. Because the sequences of the inserts in Actinobacteria and β–γ proteobacteria differ from each other and these two groups of bacteria branch very distantly in Hsp70 phylogenetic trees (Gupta 1998) (unpublished results), the inserts in these two cases have originated independently. In the DnaK structure, these inserts are located in a surface loop (Fig. 5), but the structure of the insert region in the E. coli DnaK protein was not resolved (Harrison et al. 1997).

Fig. 6
figure 6

Excerpt from DnaK sequence alignment showing a four aa conserved insert (boxed) that is specific for the β and γ proteobacteria, except as noted below. A 4–5 aa insert of independent origin is also present in this position in various Micrococcineae and Bifidobacteria. Other details are the same as in Figs. 1 and 4

To determine the functional significance of this insert, complementation studies were carried out with E. coli dnaK gene where this four aa (12 nt) insert was deleted (Table 2). The plasmid lacking this insert was unable to complement the T s growth defect of the dnaK mutant, indicating that it is essential for the proper function of the protein in E. coli. We have also examined the effect of a one aa deletion (deletion of E209) within this insert and the plasmid with this deletion showed very poor plating efficiency at 42°C (<0.5% in comparison to the + insert plasmid) providing further confirmation that this insert is essential for the proper function of the protein in E. coli. These mutant proteins were expressed at a similar level as the WT DnaK protein (Fig. 3b), indicating that their inability to support the growth of mutant E. coli was not due to their lack of expression. We have also replaced some amino acids in this proteobacterial insert with those found in the Micrococcineae and Bifidobacteria. Interestingly, the mutant plasmids containing these changes (viz. E209K or a double mutant D208G and E209K) in the E. coli DnaK partially supported (about 50% plating efficiency) the growth of the DnaK mutant cell at 42°C. We also studied the ability of DnaK homologs from P. aeruginosa and S. meliloti to complement the T s defect of the E. coli dnaK mutant. However, both these homologs despite their good expression (Fig. 3b) were inactive in the complementation test. The failure of the P. aeruginosa DnaK homolog, which contained various conserved inserts found in the E. coli protein, was somewhat surprising and it suggests that other changes in the molecule besides the inserts are responsible for its lack of complementation. In earlier work, DnaK homologs from B. megaterium (Sussman and Setlow 1987), Borrelia burgdorferi (Tilly et al. 1993), and rat Hsc70 (Suppini et al. 2004), were also found to be ineffective in complementing the T s defect of the same E. coli T s mutant.

We have identified yet another conserved insert in the Hsp70 protein that is present in a number of clostridia as well as few Rickettsiae and some other bacteria (Fig. 7). Unlike the other inserts described above, the groups of bacteria containing this insert are phylogenetically unrelated and they branch in very different positions in various phylogenetic trees (Gupta 1998; Ciccarelli et al. 2006; Viale et al. 1994). In the Hsp70 trees, these groups do not cluster together (Gupta 1998) indicating that the shared presence of this insert in these species is not due to lateral gene transfers. The most likely explanation for this two aa indel is that it has occurred independently in these groups of bacteria. This insert in the Hsp70 is present in the unstructured part of the molecule (between residues 385 and 386 in the E. coli DnaK protein) that links the ATPase or nucleotide-binding domain of the molecule to the substrate-binding domain of the protein (Harrison et al. 1997; Pellecchia et al. 2000; Suppini et al. 2004). It is of interest that this two aa insert, although present in a number of unrelated bacteria, was not found in any of the β–γ proteobacteria. Thus, it was of interest to determine whether E. coli DnaK can tolerate this insert. To examine this, a two aa insert (SG) corresponding to that found in some of these bacteria was introduced into the E. coli DnaK in the same position where it is found in other bacteria. The resulting mutant protein despite good expression (Fig. 3b) showed no complementation of the dnaK mutant (Table 2); this observation indicates that this insert is incompatible with the proper function of the DnaK in E. coli cells.

Fig. 7
figure 7

Partial sequence alignment of DnaK showing a two aa insert (boxed) that is present in a number of bacteria belonging to different groups. This insert is not present in other bacteria and it is also not found in any of the β–γ proteobacteria, which includes E. coli. The numbers on the top indicate the position of this sequence in the Clostridium bolteae and E. coli (in parenthesis) proteins. In the DnaK structure, this insert is located in the linker region that joins the ATPase domain to the substrate-binding domain

Discussion

In this work, we have examined the biological significance of several conserved inserts in the Hsp60 and Hsp70 proteins. These inserts (or indels) are located in highly conserved regions of the proteins and they are uniquely shared by various species from particular groups or phyla of bacteria. The species distribution patterns of these conserved indels strongly suggest that the genetic changes responsible for them occurred at important evolutionary branch points in the course of bacterial evolution (Fig. 2). After their introduction into these genes, these indels have been retained, apparently without any loss, by all descendant species diverging from these evolutionary branch points. The fact that similar indels are either completely absent or rarely observed in other groups of bacteria strongly indicates that these indels constitute rare genetic events and they do not occur frequently or randomly in these genes. In accordance with their highly conserved nature, results of our studies presented here show that deletion or any substantial changes in these conserved inserts lead to either complete loss or nearly complete loss (>95%) of the ability of the mutant Hsp60 or Hsp70 genes/proteins to complement the T s growth defects of the E. coli mutants affected in these proteins. Similar results were obtained for all four conserved inserts in these proteins that were studied in this work. These results provide strong evidence that evolutionary conserved inserts in protein sequences are essential for the proper functions of these proteins in E. coli and possibly all other bacteria that commonly share these characteristics. It should be acknowledged that in the present work cellular function of various Hsp60 or Hsp70 proteins affected in these insert(s) was studied only by their ability to support the growth of the T s mutants at non-permissive temperature. Although the mutant proteins affected in these inserts were inactive in these regards, it is possible that they may retain limited functions with regard to some of the specific steps in the functioning of these proteins (e.g., ATPase activity, binding to their substrates or co-chaperone, etc.) (Fenton et al. 1994; Sugimoto et al. 2007).

Of the inserts whose functions were studied, the indel in the GroEL protein consisted of only a one aa insert. Smaller indels in protein sequences are generally considered to be not as important as larger indels (Artsimovitch et al. 2003; Rokas and Holland 2000; Gupta 1998). However, deletion of this one aa insert from the GroEL protein, or most amino acid substitutions in this position, led to either complete inactivation or nearly complete loss of the activity of this protein in supporting the growth of mutant E. coli cells. These results demonstrate that not only this one aa insert is essential for the proper functioning of the GroEL protein in E. coli, but that the specific amino acids found in this position are also of critical importance and they are under strong selection pressure. Except for cyanobacteria, Hsp60 homologs from all other phyla of bacteria as well as mitochondria (where this insert is found) contain an asparagine (N) residue in this position. However, both in cyanobacteria as well as various plastids, which are derived from cyanobacteria (Palmer and Delwiche 1998; Gupta et al. 2003), a change from N → G has occurred in this position. The inability of the N153 → G mutant to function in E. coli provides evidence that this change is tolerated, and very likely required, in a specific genetic background, as in cyanobacteria and plastids.

We have also examined the effect of a conserved insert in the Hsp70 protein that is specific for other groups of bacteria, but not found in E. coli, on the function of the E. coli protein. We have described a two aa insert in DnaK in this work that is commonly shared by several Clostridia, a few Rickettsiae and some other bacteria, but which is not found in E. coli or various other β or γ- proteobacteria. When this two aa insert was introduced into the E. coli DnaK in the same position as found in the other bacteria, the resulting protein was completely inactive in complementing the growth defect of the E. coli DnaK T s mutant. This result provides strong evidence that these inserts are highly specific in terms of their functional requirement by particular groups of bacteria where they are found and they do not constitute random genetic events in functionally less constrained parts of the proteins. This two aa insert in the Hsp70 protein is located within the linker region that joins the nucleotide-binding domain of the protein with the substrate-binding domain (Pellecchia et al. 2000; Suppini et al. 2004). Based upon the high degree of flexibility that is shown by these two domains (Rist et al. 2006; Popp et al. 2005), one may expect that the length or the sequence of this linker region may not be important for the function of this protein. However, our studies indicate that this is not the case; certain changes in this region such as this insert that are specific (and presumably required) for particular groups of bacteria are not compatible with the function of the Hsp70 protein in other bacteria (E. coli). Recent studies by others have also provided evidence that the linker region of DnaK/Hsp70 plays an important role in the function of this protein (Popp et al. 2005; Rist et al. 2006). The work by Rist et al. (2006) indicates that the linker region undergoes a conformational change from a solvent-exposed to a solvent-protected state upon addition of ATP and this change is reversed upon binding of the substrate peptide to the DnaK–ATP complex.

All of the conserved inserts in the GroEL and DnaK proteins that we have studied are located on the surface of these proteins and they are generally found in loop regions connecting other secondary structural elements. This is also true of conserved inserts in a number of other proteins that have been studied (Artsimovitch et al. 2003; Opalka et al. 2000) (Gupta RS, unpublished results). Although our studies provide convincing evidence that these conserved inserts (indels) are essential for the species in which they are found, there is no information available at present regarding their cellular functions. Most of the proteins in which such conserved indels have been identified (e.g., RpoA, RpoB, RpoC, SecA, PolA, EF-Tu, GyrA, GyrB, several ribosomal proteins and aminoacyl-tRNA synthetases, etc.) (Griffiths et al. 2005; Gao and Gupta 2005; Griffiths and Gupta 2004; Gupta 2004; Gupta et al. 2003) carry out important housekeeping functions. It is expected that these functions are still carried out by these inserts-containing proteins. In view of this and the location of these inserts on the protein surfaces, it is possible that these conserved inserts are imparting some new functional capability (i.e., an ancillary function) on these proteins, which is essential for the groups of organisms where they are found. This ancillary function could include altered biochemical regulation of the protein due to its interaction with some other ligands or proteins in the cells (with the insert serving an important role in enabling this interaction), or some other novel function.

It is of much interest that recent comparative genomic studies have identified numerous proteins that are also specific for the same taxonomic groups of bacteria as these and other conserved indels (Galperin and Koonin 2004; Daubin and Ochman 2004; Gupta and Griffiths 2006; Gupta and Mok 2007; Bork and Koonin 1998). The cellular functions of most of these lineage-specific proteins are not known (Bork and Koonin 1998; Danchin 1999). However, the presence of certain conserved indels as well as lineage-specific proteins in the same groups of bacteria indicates that these genetic characteristics have coevolved. Therefore, it is possible that some of these conserved indels in the essential proteins could be interacting with one or more of the lineage-specific proteins, thereby leading to some new functional or regulatory capabilities that are specific and essential characteristics of particular groups of bacteria. Thus, further studies on understanding the cellular functions of these conserved indels are of much interest. In addition to providing novel information regarding the cellular functions of these important house keeping proteins, these studies may also lead to identification of novel biochemical and/or physiological properties that are unique and distinctive characteristics of different groups of bacteria.