Introduction

Three polypeptide tracts, each composed of the repeating Gly–Xaa–Yaa (GXY) sequence, have the ability to form a right-handed triple helix (Ramachandran 1988). This structural element is present in collagen, a protein that is found abundantly across the animal kingdom. The stability of mammalian collagens is determined at the level of the triple helix by the requirements that a glycyl residue occupies every third position, and that proline and hydroxyproline residues are frequently present in positions X and Y, respectively (Berg and Prockop 1973).

Proteins containing GXY repeats have also been found in bacteria and phages (Smith et al. 1998; Rasmussen et al. 2003). For example, streptococcal collagen-like proteins, Scl1 and Scl2 (also known as SclA and SclB, respectively), are expressed on the cell surface of the bacterium group A Streptococcus (GAS) (Lukomski et al. 2000, 2001; Rasmussen et al. 2000; Ferretti et al. 2001; Rasmussen and Bjorck 2001; Whatmore 2001). GAS strains produce many Scl1 and Scl2 variants, which have the N-terminal noncollagenous variable (V) domains followed by collagen-like (CL) regions of disparate lengths that contain variable GXY repeats. Initial studies have shown that two recombinant Scl proteins (rScl) formed stable triple helices and shared similar structural organization (Xu et al. 2002). Importantly, the triple helix in Scl can be recognized by human collagen receptors, such as integrin α2β1, resulting in cell signaling, indicating that Scl proteins are not only the structural but also the functional collagen mimics (Humtsoe et al. 2005).

Collagen and its denatured form, gelatin, are highly desired biomaterials that are used extensively in the medical field, as well as in cosmetic, pharmaceutical, and food industries (Asghar and Henrickson 1982; Lee et al. 2001). Industrial preparations of collagens and gelatins are typically obtained from bovine skin and other animal products and contain chemically modified heterogeneous peptides resulting from harsh extraction procedures. Moreover, they could be contaminated with infectious agents. Therefore, many eukaryotic expression systems have been employed to produce recombinant collagen- and gelatin-like polymers. These include mammalian cells (Geddis and Prockop 1993; Fertala et al. 1994), insect cells (Myllyharju et al. 1997; Tomita et al. 1997), transgenic mice (John et al. 1999; Toman et al. 1999), plants (Ruggiero et al. 2000), and yeast (Werten et al. 1999; Toman et al. 2000).

Collagen-like recombinant molecules were also produced in prokaryotic systems, such as Bacillus brevis (Kajino et al. 2000) and Escherichia coli (Goldberg et al. 1989; Cappello and Ferrari 1998) cultures. However, all of the above GXY polymers are based on mammalian-collagen sequences. In this study, we assess the use of the Scl-based sequences as a potential source for the production of recombinant “prokaryotic collagens” (Hook et al. 2005). The Scl sequences are analyzed for the amino acid- and GXY-repeat-usage. Several different recombinant rScl proteins are made intra- and extracellularly using an E. coli expression system and the triple-helical structure and thermal stability of these constructs is determined. The data presented in this study provide the necessary outline for the production of recombinant GXY polymers derived from Scl proteins.

Materials and methods

DNA amplification and cloning

Recombinant Scl (rScl) proteins were generated using the Strep-tag II expression and purification system (IBA-GmbH, Goettingen, Germany). Various scl alleles were polymerase chain reaction-amplified from GAS genomic DNA with primers containing synthetic extensions compatible for cloning into the BsaI-digested E. coli vector pASK-IBA2, using high-fidelity Deep Vent Taq Polymerase (New England BioLabs, Beverly, MA, USA). The thermal conditions were optimized for each primer pair in a temperature-gradient block of a thermocycler. The scl1-gene based constructs originated from the following alleles: pSL144-scl1.1 (GenBank accession number AF252861); pSL1161-scl1.28 (AY459361); pSL176-scl1.41 (AY452037); and pSL186-scl1.12 (DQ309441). The scl2-gene constructs originated from the following alleles: pSL163-scl2.28 (AY069936); pSL177-scl2.4 (DQ309442); and pSL178-scl2.77 (DQ309443). E. coli DH5α was used in cloning experiments and all plasmids were subjected to DNA sequencing.

E. coli strains harboring plasmid constructs are available upon request (S. L.).

Expression of rScl proteins in E. coli

The pASK-IBA2 vector is designed for periplasmic protein expression and induction is controlled by tet-promoter/operator system. The Scl-encoded sequence is fused at the N terminus to the OmpA signal peptide that mediates the secretion of the recombinant rScl protein to the periplasmic space. The OmpA sequence is then selectively cleavaged off by an endogenous signal peptidase, thus resulting in the release of the rScl polypeptide. In addition, each rScl polypeptide is also fused at the C terminus to the short eight amino-acid-long tag, the Strep-tag II (WSHPQFEK), which binds to the Strep-Tactin–Sepharose resins with a high affinity. This short inert tag does not interfere with triple helix formation by the recombinant rScl polypeptides. E. coli BL21 deficient in Lon protease was used for protein expression. The identity of each rScl construct was verified by amino-terminal sequencing.

Bacterial cultures were grown in a Luria–Bertani medium (LB broth, Miller). One-liter cultures contained in four-liter flasks were propagated with a constant agitation (200 rpm) at 30°C. When OD600 reached 0.5–0.6, protein expression was induced with anhydrotetracycline (0.2 μg/ml) for 3 h. After centrifugation, cell pellets were resuspended in a high-sucrose buffer P (100 mM Tris–HCl, 1 mM EDTA pH 8.0, 500 mM sucrose) and incubated on ice for 30 min. The periplasmic fraction containing the recombinant polypeptides was then separated from the cells by centrifugation and used for affinity purification on a Strep-Tactin–Sepharose column. The stringency of the wash buffer (buffer W) applied to the affinity column during purification varied in sodium chloride concentration for various constructs (100 mM Tris–HCl pH 8.0, 1 mM EDTA, 150–500 mM NaCl). Purified rScl preparations were dialyzed against 25 mM HEPES, pH 8.0 and stored at −20°C. Sample purity was analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) following staining with RAPIDstain (Geno Technology, St. Louis, MO, USA), and protein concentration was obtained by amino acid analysis using mass spectrometry.

The presence of rScl protein in a cell-free form in culture supernatant was tested with one rScl construct, P163 (EC), after precipitation with 55%-saturated ammonium sulphate. The pellet was dissolved in 25 mM Tris–HCl, 1 mM EDTA (pH 8.0) buffer and applied on a Strep-Tactin–Sepharose column.

The same P163 variant was also used to assess the feasibility of intracellular production (P163IC) of the rScl proteins. For this experiment, the scl2.28 allele was amplified and cloned into the pASK-IBA3 vector designed for intracellular protein expression. P163IC extracted from the E. coli cells was purified by affinity chromatography with Strep-Tactin–Sepharose as described above.

Circular dichroism spectroscopy

The triple-helical structure of rScl proteins was analyzed by circular dichroism (CD) spectroscopy as described earlier (Xu et al. 2002). Protein samples were dialyzed against 1% phosphate-buffered saline, pH 7.4. The far-UV spectra were recorded with a Jasco J720 spectropolarimeter in a thermostatically controlled cuvette with a 0.5-cm path length and data were integrated for 1 s at 0.2-nm interval with a bandwidth of 1 nm. A wavelength scan was performed for each rScl protein at each end point temperature before (4, 25°C) and after (50°C) unfolding or subsequent refolding (25°C). Thermal transition profiles were recorded by monitoring the change in the mean molar residue ellipticity at a fixed wavelength of 220 nm, [θ]220, as a function of temperature. A heating rate of 10°C/h was applied until the sample reached 50°C. The melting temperatures (T m) were given as the mean of T m values from several measurements.

Rotary shadowing and electron microscopy.

A two-domain structural organization of rScl constructs was viewed under electron microscope after rotary shadowing (Xu et al. 2002). Recombinant proteins were dialyzed against 0.1 M ammonium bicarbonate at the protein concentration of 100 μg/ml. Protein samples were mixed with glycerol to a final glycerol concentration of 70% (v/v), and 100 μl of each sample was nebulized with an airbrush onto freshly cleaved mica. The samples were dried in a vacuum and rotary shadowed with carbon/platinum using an electron beam gun tilted at an angle of 6° relative to the mica surface in a Balzers BAE 250 evaporator. The replicas were backed with carbon at 90° and placed onto copper grids. Photomicrographs were taken using a Philips 410 electron microscope.

Results

Analysis of the amino acid sequences of the Scl proteins

To assess sequence characteristics, total amino acid and GXY-triplet distributions were analyzed among different variants of the streptococcal collagen-like proteins Scl1 (n=28) and Scl2 (n=22) (Fig. 1). Seven residues (A, D, E, K, L, P, and Q) occupy position X in about 98% of all GXY repeats. Furthermore, the alanine residue in position X is more frequent in Scl1 (14.53%) than Scl2 (3.52), whereas L (0.89 vs 10.04%) and Q (0.99 vs 10.04%) are almost exclusive to Scl2 proteins. Position Y in the CL regions of both Scls was occupied by a set of nine amino acids (A, D, K, N, P, Q, R, T, and V) in 99% of all GXY repeats. While K (21.51 vs 6.78%), Q (22.24 vs 9.79%), and T (10.98 vs 5.59%) are more frequent in Scl1 variants, residues D (5.03 vs 30.63%), and N (0 vs 4.58%) are more specific for Scl2. In general, the amino acids C, H, W, and Y were never found in collagen-like regions of either Scl protein. In addition, position X was never occupied by R in Scl1 nor by I or V in Scl2, whereas position Y was never occupied by M or N in Scl1 nor by E or F in Scl2. Thus, a limited set of amino acid residues is favored in the X and Y position of the Scl proteins suggesting that these residues are important for the formation and stability of the collagen-like triple helical structures observed in these proteins.

Fig. 1
figure 1

Total amino acid and GXY-triplet distributions in the CL regions of Scl1 and Scl2 proteins. Frequencies are expressed as percentile values among 28 different Scl1 and 22 Scl2 variants. The frequency of distribution of GXY triplets is shown in the diagram. GXY triplets present only in Scl1 variants are shown in black fields, triplets present only in Scl2 variants are shown in light-grey fields, and triplets found in both Scl1 and Scl2 proteins are shown in white fields. Frequencies of amino acids observed in position X are shown to the right and those observed in position Y are shown below the diagram

Sixty-two distinct GXY triplets have been identified in the CL regions of all 50 Scl1 and Scl2 variants; predominant and rare GXY repeats were identified (diagram in Fig. 1). Only six GXY triplets (GEA, GPA, GKD, GEK, GPQ, and GET) occur with frequencies higher than 5% accounting for ∼50% of the CL regions, with GKD being the most common (17.15%). In contrast, as many as 40 different triplets occurred with frequencies below 1%, contributing to a total of 9.4% of the CL regions. While some triplets were equally distributed among both Scl proteins, others were more frequent in either Scl1 (GEK, 13.87%) or Scl2 (GKD, 26.93%; and GPA, 10.61%) variants. Although tripeptide GKD is present in the Scl1-CL regions (4.65%), it is far more predominant in the Scl2-CL regions (up to 43% in some variants). Furthermore, several GXY repeats are found only in either Scl1 (Fig. 1; black fields) or Scl2 (Fig. 1; light-grey fields) variants.

Extracellular and intracellular expression of rScl proteins in E. coli

The extracellular portion of Scl proteins, lacking the N-terminal signal sequence and C-terminal cell-wall associated region, was expressed in rScl constructs. The periplasmic compartment was chosen because Scl proteins are naturally expressed on the cell-surface by group A Streptococcus, and we could not exclude the possibility that triple helix formation requires the secretion of the protein. Without any optimization of the expression conditions, the yield obtained for different rScl variants varied over a wide range from 1 mg/l (P161) to 20 mg/l (P163). In general, the purity of protein preparations after single-step affinity-chromatography purification was estimated to be >90%, as judged by SDS-PAGE analysis (Fig. 2a).

Fig. 2
figure 2

Recombinant rScl proteins. Various Scl1- and Scl2-derived recombinant proteins were expressed in E. coli, purified by affinity chromatography with Strep-Tactin–Sepharose, and analyzed in 10% SDS-PAGE stained with RAPIDstain. M molecular size marker. a rScl constructs produced in E. coli periplasm. b Protein fraction secreted into the culture medium precipitated with 55% ammonium sulfate (P163EC) or P163IC construct expressed intracellularly. Crude preparations of P163EC and P163IC were purified by affinity chromatography with Strep-Tactin–Sepharose. c Structural organization of P163, P163EC, and P163IC as viewed by electron microscopy after rotary shadowing. All three preparations share similar two-domain lollipop-like organization. Bars 50 nm

The pASK-IBA2 vector employs the OmpA signal sequence for fusion transport across the cytoplasmic membrane. It has been reported that OmpF, which is another outer-membrane protein, was secreted in large quantities into the culture medium (Jeong and Lee 2002). We tested whether recombinant protein, P163EC (extracellular), was secreted into the growth medium. Following precipitation from the culture medium with 55% ammonium sulfate, a crude protein sample was subjected to the standard purification protocol on Strep-Tactin–Sepharose column and analyzed by SDS-PAGE (Fig. 2b). Approximately 5 mg of P163EC was recovered from 1 l of the culture medium compared to ∼2 mg/l of P163 extracted from the periplasm.

Intracellular expression of a synthetic collagen-like polymer in E. coli has been reported, although structural characteristics of this polymer were not presented (Yin et al. 2003). The scl2.28 allele was cloned into the pASK-IBA3 vector designed for intracellular expression. The resulting P163IC (intracellular) construct was expressed and purified with Strep-Tactin–Sepharose at a high yield of ∼20 mg of pure protein per 1 l of culture (Fig. 2b). The molecular weight of purified P163EC and P163IC preparations was the same as that of P163 produced in the periplasm.

Structural organization of rScl proteins

Electron microscopy of rotary shadowed samples has been used for the investigation of structural organization of proteins with collagenous domains (Engel and Furthmayr 1987). It was previously demonstrated that rScl proteins may form lollipop-like structures with the lollipop stalks being the CL regions and the globular heads, the V regions (Xu et al. 2002). Examination of the rotary shadowed samples of all rScl proteins shown in Fig. 2a revealed a characteristic lollipop-like structural organization (data not shown). The length of the stalks was consistent with the calculated length of the CL regions in a collagen-like triple helical conformation. Likewise, all three P163 preparations obtained from the periplasm, precipitated from the culture medium, and expressed intracellularly shared similar structural organization as viewed by electron microscopy following rotary shadowing (Fig. 2c).

Triple helix formation by rScl proteins

The triple helix formation by each recombinant rScl construct was studied by circular dichroism (CD) spectroscopy (Fig. 3). The Scl variants selected for cloning and expression differed significantly in their CL-region length (34–79 GXY repeats) and primary sequence. Triple helix formation was assessed by the presence of the characteristic molar ellipticity maximum at 220 nm, [θ]220 (Brodsky-Doyle et al. 1976). The CD spectrum of each rScl sample recorded at 25°C resembled that of a collagen triple helix, with an increased ellipticity at ∼220 nm (Fig. 3a). The increase was not seen when the CD spectra were recorded at 50°C, indicating that the triple helix structure had unfolded. When protein samples were cooled to 4°C from 50°C, the ellipticity at 220 nm gradually increased, and the CD spectra again showed the characteristics of a collagen-like triple helix (for clarity, unfolding and refolding curves are not shown on the plot). The differences between [θ]220 values observed for various rScls reflect the contribution of the secondary structures in their noncollagenous regions.

Fig. 3
figure 3

Circular dichroism spectra and thermal stability. a Wavelength scans of recombinant rScl proteins at 25°C. b Thermal unfolding profiles recorded for the same constructs at 220 nm as a function of increasing temperature with a slope of 10 C/h. c Correlation between thermal stability and number of GXY repeats in the CL region of rScl polypeptides. Black squares P163 and its extended derivatives P188 and P189. Grey squares P144 and its in-frame deleted derivative P206. The midpoint melting temperatures T m±SD are shown

Thermal stability of the rScl proteins

Thermal unfolding of the collagen triple helix in the rScl proteins was measured by monitoring the CD at 220 nm as a function of increasing temperature (Fig. 3b). Most rScl variants showed a triple helix to random coil transition within a narrow temperature range. The thermal unfolding was reversible because CD spectra at 220 nm increased upon cooling the samples back to 4°C (refolding curves are not shown on the plot). The midpoint melting temperatures, T m, for all rScl variants ranged between 34.1°C measured for P161 and 39.1°C for P178 (Fig. 3c). In general, higher T m values were observed for rScl proteins with longer CL regions, although noncollagenous domains seemed to contribute to the stability of these proteins.

Structure and stability of rScls with long and short CL regions

We generated rScl constructs with extended CL regions. A DNA fragment encoding 25 GXY repeats from the interior of the P163-CL region was amplified with primers flanked by the BsaI- cleavage sites and cloned at the single BsaI site of the P163-CL region. A single 25× GXY insertion resulting in P188 with a total of 104 triplets, and a double insertion resulting in a 129-GXY-long CL region in P189 were identified (Fig. 4a). As expected, both variants formed lollipops with elongated stalks, and increased thermal stability of the P188 and P189 was recorded as T m=37.9°C and T m=38.1°C, respectively, compared with T m=37.7°C of the original P163 construct (Fig. 3c). Similarly, an internal in-frame deletion of 30 GXYs in the P144-CL region decreased the T m=36.4°C recorded for P144 to T m=31.6°C of the resulting P206 (Fig. 3c), although structural organization remained the same (Fig. 4b).

Fig. 4
figure 4

Expression and structure of extended recombinant rScl variants. a P163-CL region (79 GXY repeats) extended by inserting 25 (P188, 104 GXY repeats total) and 50 (P189, 129 GXY repeats) additional GXY repeats. b In-frame deletion of 30 GXYs in P144-CL region (P206, 20 GXY repeats). Recombinant rScl constructs were expressed in the periplasm, purified with Strep-Tactin–Sepharose, and analyzed in 10% SDS-PAGE. M Molecular size markers. Structural organization viewed by electron microscopy after rotary shadowing. Bars 50 nm

Discussion

Eukaryotic and prokaryotic collagens evolved different molecular mechanisms to stabilize the triple helix. While collagen triple helix is stabilized in eukaryotes by the presence of hydroxyprolines (Hyp) in the Y position, the mechanisms stabilizing “prokaryotic collagens”, which naturally lack Hyp residues, are poorly understood. Recent sequence analysis proposed that prokaryotic GXY tracts are mainly stabilized by threonine and glutamine residues often observed in position Y. In some bacterial collagen-like proteins, threonines occur with frequencies above 50% of all Y positions, as compared with only 2.6% found in human collagens (Rasmussen et al. 2003). Threonine in this position may substitute for hydroxyproline in the formation of hydrogen bonds (Kramer et al. 1999). A similar mechanism may be true for glutamine because all three amino acids, e.g., hydroxyproline, threonine, and glutamine are hydrogen donors. However, threonines occur in the Y position of ∼8% of the Scl-GXY repeats (Fig. 1), and are twice as frequent among Scl1 compared to Scl2 variants. In aggregate, the most predominant amino acids found in position X of Scls are K (21.35%), E (24.1%), and P (24.59%) whereas position Y is mainly occupied by A (15.04%), Q (15.15%), and D (19.37%).

The experiments with model synthetic “host” peptides containing the so-called “guest” residues defined triplets stabilizing the triple helix despite a lack of hydroxyproline residue (Ramshaw et al. 1998; Persikov et al. 2000). We found that Scl proteins incorporated several of such triplets including GPA (8.77%), GPQ (5.18%), or GPR in Scl2 (4.58%). In addition, the stabilizing context of strings of residues with opposite charges, such as KGE/D (Persikov et al. 2005), is often found in Scl proteins. Based on this analysis, the thermal stability of the Scl-derived triple helices can be manipulated allowing for the construction of custom-designed GXY polymers.

Unhydroxylated recombinant gelatin-like and collagen-like polymers, derived from mammalian collagens, have been designed and produced in yeasts, transgenic plants, and bacteria (Goldberg et al. 1989; Cappello and Ferrari 1998; Werten et al. 1999; Kajino et al. 2000; Ruggiero et al. 2000). These long unhydroxylated GXY polymers derived from type I collagen had significantly lower melting temperatures compared with the native molecule (30 vs 41°C) (Perret et al. 2001). It is interesting to note that the rScl constructs with significantly shorter GXY tracts had mid-point melting temperatures of about 35–39°C, which is approximately 5°C higher than long-unhydroxylated molecules derived from human- or bovine-collagen sequences, and recombinant P206 containing only 20 GXY repeats formed a stable triple helix (T m≈32°C). Furthermore, a significant increase in the CL-region length (79–129 GXYs) did not significantly increase the thermal stability of their triple helices (37.7–38.1°C). Thus, thermal stability of the GXY polymers derived from natural Scl variants will largely oscillate around body temperature and can be increased by the enrichment with triple-helix stabilizing triplets such as GPR, GPQ, and GPA.

In this study, we showed that recombinant rScl polypeptides form triple helices when expressed intracellularly and extracellularly, both in the periplasm and secreted into the medium. The overall yield, without optimization, varied greatly for different variants between 1–20 mg/l of culture, although rScl could be recovered in significant quantities from culture medium, thereby potentially increasing the production yield. In addition, recloning of the P163 sequence into the pCold III vector resulted in increased expression above 30 mg/l (B. Brodsky, personal communication). Recent expression studies of collagen-like polymers in E. coli indicated that media, growth conditions, and construct design may greatly enhance the production yield (Sang 1996; Hannig and Makrides 1998; Yin et al. 2003). Thus, it is very likely that the production yield of Scl-based collagens can be significantly increased.