Introduction

In the last two decades or so, prokaryotic expression systems, particularly E. coli, have been exploited for the production of a variety of therapeutic proteins, on an industrial-scale. The challenges faced by the biotechnologists to achieve a finer balance between the quality and the yield of the desired protein, along with well-documented strategies to troubleshoot some of these hurdles, are discussed at length in this review.

Prokaryotic cells (E. coli) are normally the preferred host for the expression of foreign proteins because they offer (i) inexpensive carbon source requirements for growth, (ii) rapid biomass accumulation, (iii) amenability to high-cell density fermentation, and (iv) simple process scale up. However, lack of post-translational machinery and production of inactive protein due to the formation of inclusion bodies, offer a significant challenge in these expression systems.

A number of current protocols are available which describe various strategies for the conversion of inactive protein, expressed as insoluble inclusion bodies, into a soluble and active fraction. Overall, these strategies can be sub-divided under three major headings:

  • Group-I: The factors influencing the formation of insoluble fraction are modified through a tight control of the cellular milieu, thereby leading to soluble protein expression.

  • Group-II: Where expressed protein is refolded from the inclusion body fraction.

  • Group-III: The desirable protein expression is obtained in a soluble fraction through fusion protein production.

Before we discuss these three approaches at length, we would like to outline various genetic as well as environmental factors associated with bacterial expression systems, which in one way or the other influence successful production of biologically active recombinant proteins (Tables 1, 2).

Table 1 The least used codons in E. coli, yeast, drosophil a, dictyostelium and primates: [1, 2]
Table 2 Biotechnological innovations for producing biologically active proteins through E. coli

Recombinant protein expression: the challenge to overcome the codon bias

More than one codon encodes most of the amino acids and all the available amino acid codons are utilized as per the codon bias of each organism. Transfer RNA of every cell closely reflects the codon bias of its mRNA [30, 31]. The requirement for one or more rare tRNAs during heterologous target gene over-expression in E. coli results in lower translation rate due to differences in codon usage [3234]. If high-level E. coli expression vector systems are utilized, the presence of small number of rare codons in the heterologous gene does not lead to significant reduction in the target protein synthesis. However, the same does not hold true when a gene encoding clusters of rare E. coli codon is over-expressed, especially when the N-terminus of a coding sequence consists of successive rare codons [1].

Rare codons in E. coli

Assessment of the codon usage in all E. coli genes reveals a number of codons that are under represented (Table 1). The codon usage of abundantly expressed genes exhibit a trend where the low usage codons [viz. AGA, AGG, and CGA (Arg); AUA (Ile) and CUA (Leu)] signifying less than 8% of their corresponding codon partners, are avoided and the codons GGA (Gly), CGG (Arg), and CCC (Pro) fall to less than 2% of their respective populations. Depending on the status of these codons, E. coli expresses the target genes at a similar or higher level to those of well-expressed endogenous genes during its normal growth [1, 2].

Modification of the culture milieu (e.g., media composition, lower temperature) does not have any significant role in shifting the codon usage levels of most of the tRNA isoacceptors corresponding to the rare codon. These codon-usage bias-related translation problems will increase if over-expressed protein is abundant in a particular amino acid. Supplying the limiting amino acid in the culture medium leads to improved expression in some cases [1, 31, 32], and some of these issues are outlined below.

“On demand” supply of the rare codons

Expression of genes containing the rare codons can be improved with a relative increase of the associated tRNA within the E. coli host by manipulating the copy number of respective tRNA gene through a plasmid. Several studies have reported a significant increase in protein yield when E. coli hosts are utilized with additional argU (AGG/AGA), glyT (GGA), ileX (AUA), leuW (CUA), or proL (CCC) expression by increasing gene dosage of the respective tRNAs [26].

To optimize the expression of AT or GC rich genes with the corresponding codon usage bias, several combination plasmids, e.g., the pRIG, pRARE, pACYC, pSC101, etc. encoding various rare tRNA genes (argU, ileX, glyT and leuW) under the control of their native promoters, are being utilized. The BL21-CodonPlus (DE3)-RIPL cells (Stratagene, USA) have the extra copies of tRNAs (argU, ileY, leuW, and proL tRNA) to circumvent the most frequently restricted translation of heterologous GC-rich genes. It appears that this strain rescues the expression of heterologous proteins from organisms that have either AT-or GC-rich genomes thereby reducing the chances of truncated protein formation and resulting in full-length active proteins [79]. Despite these advances, the optimal utilization of rare codons for the over-production of a catalytically active eukaryotic protein at an industrial level, remains a challenge for the Protein Biochemist.

Protein folding and disulfide bond formation

Expression of recombinant proteins in E. coli is largely directed to three different locations namely; the cytoplasm, the periplasm, or the growth medium through secretion. Various advantages and disadvantages are associated with respect to directing the recombinant protein to a specific location. Expression in the cytoplasm is preferred as a norm since in this case, the production yields are usually high.

Disulfide bond formation in E. coli is an isolated and actively catalyzed reaction, which is performed in the periplasm through the Dsb system [1215] where thioredoxin and glutaredoxin facilitate reduction of cysteines. However, thioredoxin and glutaredoxin are kept reduced by thioredoxin reductase (trxB) and by glutathione, respectively. Glutathione in turn is reduced by glutathione reductase (gor). Formation of disulfide bonds in the E. coli cytoplasm has been achieved by disrupting the trxB and gor genes. Furthermore, co-expression of either DsbC in strains lacking trxB as well as gor, or thioredoxin as a fusion partner can lead to improved folding and disulfide bond formation [1619].

Apart from the Dsb system, protein disulfide isomerase (PDI), small peptides containing the active site of PDI or chemically synthesized dithiol molecules mimicking PDI function in combination with a redox system, have been utilized as catalysts in assisting disulfide bond formation during in vitro protein refolding, thereby improving the refolding rate as well as yield [2022].

Reducing agents, such as dithiothreitol (DTT) or β-mercaptoethanol, are utilized for disrupting non-native disulfide bonds during solubilization of the inclusion body proteins by chaotropic agents (e.g., guanidinium hydrochloride, urea). Formation of native disulfide bonds following inclusion body solubilization can be achieved by adding low-molecular weight thiols (e.g., glutathione) in their reduced as well as oxidized state to the refolding buffer [3538].

The protein aggregation problems associated with non-specific disulfide-bond formation, especially during the early stages of refolding, can be solved either by S-sulfonation or by altering the free cysteines through the addition of an oxidized form of a thiol reagent (e.g., glutathione) or a cysteamine-cystamine redox buffer. These chemical modifications introduce charge to the protein residues, which reduce aggregate formation as well as induce better inter-molecular interactions leading to the correct refolding [23, 24].

Eukaryotic protein phosphorylation in E. coli

Escherichia coli lacks most of the eukaryotic post-translational machinery, including serine/threonine/tyrosine protein kinases, which is considered a key disadvantage for producing the eukaryotic phosphoproteins. To circumvent this, introduction and expression of a protein kinase and its substrate from two separate or single plasmid vectors in the same E. coli host cell, has been well documented. This approach may be used for the co-expression of other mammalian protein modifying enzymes, such as protein methylases and acetylases and their substrates. Co-expression of such enzymes and their substrates results in the production of recombinant proteins that closely resemble native eukaryotic proteins [25, 26].

Interestingly, few research findings have observed that E. coli could perform protein phosphorylation by endogenous protein kinases by utilizing adenosine triphosphate as a phosphoryl donor. Endogenous phosphorylation of isocitrate dehydrogenase; p68 RNA helicase; Single-stranded DNA-binding proteins (SSBs); are a few examples which clearly show that recombinant proteins expressed in E. coli are phosphorylated at multiple residues by endogenous protein kinases [27, 39, 40].

Protein glycosylation in E. coli

Glycosylation is a common but complex form of post-translational modification, which leads to formation of cellular glycans often attached to the proteins and lipids. Glycosyltransferases and glycosidases are responsible for glycosylation of many proteins. These enzymes significantly differ in carrying out dimerization, proteolytic digestion, secretion, and auto-glycosylation processes. Glycoproteins, which are commonly distributed in eukaryotic cells, appear to be rare in prokaryotic organisms, as cellular organelles required for glycosylation are missing [41].

Until recently, non-enzymatic glycosylation (glycation) was thought to affect only the eukaryotic proteins. Glycoprotein synthesis has been observed in prokaryotes with certain structural variations from their eukaryote counterparts and additional data on glycation in E. coli has now become available. Occurrence of non-enzymatic glycosylation associated with some post-translational modifications results in advanced glycation end (AGE) product formation during the normal growth cycle of E. coli. In addition, recombinant human interferon-γ (rhIFN-γ) expressed in E. coli has also been shown to be associated with early glycation products [28, 42].

Apart from natural glycosylation processing in E. coli, synthesis of selectively glycosylated proteins can also be achieved by co-translating the genetically encoded modified amino acids. This approach has been utilized for large-scale production of myoglobin containing the position specific β-N-acetylglucosamine (β-GlcNAc)-serine in E. coli [29].

Recombinant proteins: trans-membrane transport

Trans-membrane transport is normally mediated by the N-terminal signal peptides, which target the expressed protein to a specific transporter complex in the membrane. A considerable reduction in the amount of protein impurities can be achieved by targeting protein production in the host’s periplasm. Other benefits of trans-membrane transport include a much higher probability of obtaining target protein with a correct N-terminus, decreased proteolysis due to lesser contaminants and simplified protein release by routine osmotic shock procedures [43, 44].

Periplasmic leader sequences frequently used for potential export of the recombinant proteins are derived from ompT, ompA, pelB, phoA, malE, lamB, and β-lactamase [15, 4552]. Few proteins are exported across the inner membrane to the periplasm by utilizing a well-known Sec translocase apparatus [53, 54].

Recombinant protein folding modulators in the periplasm

One of the features that differentiate the periplasm from the cytoplasm is its oxidizing environment. In E. coli, proteins with stable disulfide-bond formation occur only in the cell envelope, catalyzed by thiol-disulfide oxidoreductases known as the Dsb proteins, as discussed earlier in this review. Potential periplasmic export and subsequent enhanced disulfide bond formation of the target proteins can be achieved via fusion to DsbA or DsbC [15, 55, 56]. Furthermore, the E. coli periplasmic chaperone Skp, a holdase, supports un-folded proteins as they surface from the Sec translocase apparatus for assisting the folding and membrane insertion of outer membrane proteins. Other periplasmic folding modulators include the PPIases, Sur A, FkpA, PpiA, PpiD, etc.; among these, FkpA has the most desirable folding activity with a combination of PPIase and chaperone functions [5762].

Another soluble periplasmic protein, DsbA, with an active site embedded in a thioredoxin-like fold, promotes disulfide transfer to substrate proteins by the formation of mixed disulfide species. Here, DsbC may also come to the rescue of incorrect disulfide bond formation by seizing folding intermediates and catalyzing isomerization in a process involving mixed disulfide intermediates [6365].

Recombinant proteins: secretory mode of expression

Most E. coli strains are characterized by the absence of well-organized pathways for protein translocation through the outer membrane. However, some proteins having periplasmic leader sequences for export may seep out into the extra-cellular medium. The signal recognition particle (SRP)-dependent pathway has been utilized for exporting a subset of proteins [66, 67]. Bacterial SRP, a GTPase, can bind either to the signal sequence of certain secretary proteins (provided that it is highly hydrophobic) or to trans-membrane segments of inner membrane proteins as they emerge from the ribosome. The SRP-bound ribosome nascent chain complex (RNC) is then targeted to the membrane [6870]. In other studies, the ompA signal sequence has been used for translocating recombinant peptide to the periplasm for probable secretion into the growth medium [7173]. In addition, the co-expression of two secretion factors (secE and secY) has been shown to further increase the secretion of certain proteins [7476].

Strategies for the production of active proteins through a tight control of the E. coli cellular milieu—Group-I

Induction of recombinant protein expression in E. coli involves synergy from host’s transcriptional and translational machinery. As the expressed protein some times accounts up to ≈30% of the total cell protein, metabolic load on the E. coli expression machinery is tremendous. Constant improvements in the protein solubilization and refolding procedures have become feasible through innovative R&D practices on the structure, function, and regulation of recombinant protein aggregation in E. coli. Some proteins adversely affect the host through their catalytic properties and sometimes over-production of protein may also cause toxicity to its host. Therefore, the host cell’s capability to express proteins in soluble fraction also depends on the total metabolic load and can be regulated by a number of factors leading to the formation of soluble protein [77, 78].

These factors include:

Protein expression at lower temperatures

Protein expression in E. coli growing at low temperature has demonstrated its success in improving the solubility of a number of difficult proteins. Generally, expression at low temperature conditions leads to the increased stability and correct folding patterns, which is due to the fact that hydrophobic interactions determining inclusion body formation are temperature dependent. Apart from this, any expression associated toxic phenotype observed at 37°C incubation conditions, gets suppressed at lower temperatures [7981]. The increased expression and activity at lower growth temperatures has also been associated with increased expression of a number of chaperones in E. coli [82]. It has been observed that heat shock proteases induced during over-expression are poorly active at lower temperature conditions. Therefore, growth at a temperature range of 15–23°C, leads to a significant reduction in degradation of the expressed protein [83, 84].

Lower growth temperature conditions strongly affect the efficiency of traditional promoters, which are used in routine vectors for recombinant protein expression in E. coli [85]. To improve upon this shortcoming, a cspA promoter-based E. coli expression vector has been generated for stronger expression of recombinant proteins at lower temperatures. Production of proteins having membrane spanning domains, or otherwise unstable gene product, has been successfully carried out utilizing this versatile expression tool [8587].

Protein expression data from our lab indicates that induction of human Phosphodiesterases (PDE-3A, PDE-5A) and p38-α map kinase expression at 22 and 18°C respectively, increased the production of functionally active enzymes as compared to growth temperatures of 30–37°C (unpublished data). Clearly, there will be disadvantages associated with growth at low temperatures, which might include reduced transcription and translation, ultimately leading to a poor rate of turnover of the recombinant protein.

Genetically modified E. coli strains for achieving improved protein solubility

BL21 E. coli strain (Novagen, USA) is an ideal organism for routine protein expression. BL21 and its derivatives are deficient in lon and OmpT proteases, a genetic modification, which is responsible for increased protein stability [88]. BLR, the recA derivative of BL21, has been shown to stabilize target plasmids, which are having repetitive sequences or where such products may lead to the loss of DE3 prophage [8991]. In the lac ZY deletion mutant of BL21 (TunerTM) one can achieve uniform and adjustable protein expression in all the cells. Furthermore, a genetically modified lac permease (lacY) mutation strain has been created, which offers uniform entry of IPTG into all the cells in the population [92], thereby, homogenous and concentration dependent levels of IPTG induction can be tightly regulated.

Relatively new entrants, OrigamiTM2 (Novagen) host strains harbor both the trxB and gor gene mutations, thus, rendering greatly enhanced disulfide bond formation in their cytoplasm. Rosetta-gami strains, in addition to the above-mentioned features, also have an over expressing rare tRNA expression vector set for overcoming codon bias associated problems, as discussed earlier in this review. Origami-B host strains are derived from a lac ZY mutant of BL21 to enable precise control of expression levels by fine tuning the concentration of IPTG, as is in the case of lacY mutation. In addition to trxB and gor mutations, these strains include the Ion and OmpT deficiencies of BL21, thereby enhancing the solubility of the recombinant proteins [10, 93].

Modification in media composition to obtain soluble protein

The level of intracellular accumulation of a recombinant protein in E. coli is dependent on its—specific activity—growth factor requirements as well as -the final cell density [11, 9496]. Production of recombinant protein during batch culture requires nutrients for growth from the very beginning since there is a limited control on the growth parameters. This process often leads to changes in pH, concentration of dissolved oxygen, substrate depletion as well as accumulation of inhibitory intermediates from various metabolic pathways. These changes are detrimental for the production of soluble as well as correctly folded active protein. Depending on the nature of the expressed protein, proper and efficient protein folding might also require specific cofactors in the growth media, for example, metal ions such as iron-sulfur and polypeptide-cofactors e.g., flavin-mononucleotide (FMN). Addition of these factors to the batch culture considerably increases the yield as well as the folding rate of the soluble proteins [97, 98].

Development of the best expression media formulation involves both experimental and various permutations & combinations of some of the processes outlined above. Currently, for experimental plans, reliable statistical techniques are also in use, which provide best possible media compositions. A precise and comprehensive research scheme is needed for the successful prediction of optimal medium, which can then lead to the production of an active recombinant protein [99, 100].

The composition of growth medium affects the relative levels of soluble protein production. We have carried out a comparative study to standardize the expression of soluble and active protein by utilizing LB, SOC, Terrific broth, Super broth and M9 media. Compositions of various media were modified for achieving better biological activities of human PDE-3A, PDE-5A (unpublished data), and p38-α Map kinase [101] enzymes expressed in BL21 E. coli strain containing glycerol (1–2%) and glucose (0.8–1.0%), respectively. These modifications reduced the expression time, increased the soluble fraction yield and significantly enhanced the biological activity of these enzymes (Table 3).

Table 3 Modifications of various bacterial growth media compositions for enhancing solubility of the expressed proteins

Co-expression of molecular chaperones

Molecular chaperones are proteins adapted to assist de novo protein folding, facilitate expressed polypeptide’s proper conformation attainment, and/or cellular localization without becoming a part of the final structure. Molecular chaperone co-expression strategy has been followed in many instances for preventing inclusion body formation, thereby leading to an improved solubility of the recombinant protein. Chaperones, like trigger factor, help in recombinant protein refolding where these polypeptides continue to attain folding into the native state even after their release from the protein-chaperone complex [103, 104]. Many other chaperones have also been shown to be involved in preventing protein aggregation [105109]. Molecular chaperones have been divided into 3-functional sub-classes based on their mechanism of action:

  1. 1.

    Folding chaperones (e.g., DnaK and GroEL) mediate the net refolding/un-folding of their substrates through an ATP-dependent conformational change, thereby preventing the inclusion body formation via reduction in aggregation and promoting the proteolysis of misfolded proteins [103, 104, 108, 109].

  2. 2.

    Holding chaperones (e.g., IbpA, IbpB) can work in association with the folding chaperones to hold partially folded proteins on their surface and under certain circumstances, protect heat-denatured proteins from irreversible aggregation [110, 111].

  3. 3.

    Disaggregate chaperones (ClpB, HtpG) promote the solubilization of stress-induced aggregated proteins [112114].

Co-expression of various chaperone-encoding genes and recombinant target proteins has proven to be effective in getting the protein expressed in a soluble fraction [103]. As an example, GroEL–GroES and DnaK–DnaJ–GrpE chaperone system co-expression along with the trigger factor has been shown to further stimulate solubility of the expressed proteins [115, 116].

The major bottlenecks which are frequently encountered during the expression of eukaryotic proteins in bacterial system include: (i) inherent property of these recombinant proteins, which may interfere with the host’s metabolic pathways, thereby causing toxicity; (ii) Expression at temperatures in the range of 37°C, leading to over-production of protein which invariably goes to the insoluble form; (iii) Lack of availability of optimal nutrient media during induction of the recombinant protein generally leads to majority of the protein getting in as insoluble fraction. In summary, the cell mass, growth factor requirements and specific activity of the bacterially expressed protein determines its physical outcome; whether the protein is going to be expressed in a soluble form or as an insoluble one.

Strategies for the production of active proteins by inclusion body refolding—Group-II

Inclusion bodies are refractive, intracellular protein aggregates usually observed when the target gene is over-expressed in the cytoplasm of E. coli. This aggregation of proteins leads to a highly unfavorable protein-folding environment. Formation of inclusion bodies in recombinant expression systems occurs as a result of erroneous equilibrium between in vitro protein aggregation and solubilization. Inclusion bodies accumulate in the cytoplasm or the periplasm depending on whether or not a recombinant protein has been engineered for periplasmic localization or secretion [113, 116, 117].

Re-solubilization and refolding of E. coli inclusion body proteins

The recombinant protein expression as inclusion bodies could be a blessing-in-disguise since it can be easily purified from the E. coli homogenate. Higher expression levels as well as differences in their compactness, compared to other cellular impurities, leads to their easier physical separation e.g., through centrifugation. Apart from this, resistance to cellular proteases confers lower degradation and minimal impurities, therefore fewer purification steps are required, which drastically reduce the time and effort in getting the final product. Inclusion bodies facilitate straightforward purification of the protein of interest, although the loss of bioactivity of the expressed protein could be a major bottelneck. Despite this drawback, recombinant proteins expressed as inclusion bodies in E. coli have been most widely used for the commercial production of therapeutic proteins as the loss in protein recovery is greatly compensated by a very high level of expression [118120].

So far, a major bioprocess challenge has been to convert these inactive and insoluble inclusion bodies into more efficient, soluble, and correctly folded product. The major drawbacks during the refolding of inclusion body proteins into bioactive components are; (a) reduced recovery, (b) the requirement for rigorous optimization of refolding conditions for each target protein, and (c) the possibility that the re-solubilization procedure could affect the activity of the refolded protein. Therefore, at times the purification of highly expressed soluble protein is less expensive and time saving option as compared to the refolding and purification from inclusion bodies. Exploiting the production of recombinant proteins in a soluble form still remains a preferable alternative to the in vitro refolding procedures.

Isolation and solubilization of inclusion bodies

A high degree of purification of the recombinant protein is achieved by isolation of the inclusion bodies. Under these circumstances, treatment with lysozyme along with EDTA before cell homogenization is carried out to facilitate cell disruption. Inclusion bodies are recovered by low speed centrifugation of bacterial cells mechanically disrupted either by using ultrasonication for -small, French press for -medium, or high pressure homogenization for -large scale cultures. Bacterial cell envelope as well as the outer membrane proteins, which co-precipitate with the insoluble cellular fractions during culture processing, form major fraction of the crude inclusion body impurities [102, 121, 122]. These contaminants can be easily removed by adding detergents such as Triton X-100 and/or low concentrations of chaotropic compounds, either prior to the mechanical disruption of the bacterial cells, or during crude inclusion body washing steps [123127].

After the isolation and removal of the above-mentioned impurities, inclusion bodies are commonly solubilized by various concentrations of chaotropic agents such as guanidinium hydrochloride, urea, or several other agents, like for example, arginine, which is an aggregation suppressor. Although expensive, guanidinium hydrochloride is generally favored due to its better chaotropic properties [128133]. In comparison, inclusion body solubilization by urea is pH dependent and therefore, determination of the optimum pH conditions for each protein becomes a pre-requisite. In addition, utilization of urea solutions has been shown to produce cyanate, which can carbamylate the amino groups of the expressed proteins, thereby negatively affecting the activity of the protein [134136]. Inclusion body proteins solubilized under mild denaturant conditions have been shown to possess a native secondary structure and retain their biological activity up to certain extent [137140]. It has been postulated that milder solubilization conditions lead to superior refolding yields compared to the solubilization by high concentrations of guanidinium hydrochloride or urea [141].

There are also reports suggesting that extreme pH conditions, in the presence or absence of low concentrations of denaturants [119, 142], frequently lead to the solubilization of inclusion bodies. However, extreme pH treatment often results in irreversible modifications such as deamidation and alkaline desulfuration of cysteine residues, which often alters bioactivity of the resultant protein [143145] and therefore is not the choicest of method among the protein scientists.

In addition to the above-mentioned solubilizing agents, low molecular weight thiol reagents viz. DTT or 2-mercaptoethanol are often used during the inclusion body solubilization to reduce non-native inter- and intra-molecular disulfide bonds [136, 146148]. Appropriate redox conditions are required for proteins having disulfide bonds in their native state. Mild alkaline pH offers optimum conditions for disruption of existing disulfide bonds. Before starting the refolding procedure, residual reducing substances, which negatively affect the refolding process, are removed (e.g., by dialysis). As an alternative, immobilized reducing agents (e.g., VectraPrime-Fluorochem Limited, UK; Immobilized DTT-DSH-Biovectra, USA) are also in the vogue, as they simplify removal of these reducing agents. This can easily be achieved by a centrifugation step after the solubilization is complete. To avoid the non-specific disulfide bond formation, the pH of the solution containing solubilized protein should be lowered before removing the reducing agents [136].

Aggregation and precise refolding of solubilized and un-folded proteins

Usually, the methods used for inclusion body solubilization could lead to a non-native conformation of the expressed protein. To achieve native form, the target protein needs to undergo proper refolding procedures during low denaturant concentrations. Protein refolding follows first order kinetics and has been shown to involve intra-molecular interactions. Protein aggregation, on the other hand, involves non-native inter-molecular interactions between protein folding intermediates, and is preferred at high protein concentrations. Higher concentration of the un-folded protein often leads to decreased refolding yields, irrespective of which refolding method has been applied [149151]. Since refolding process does not implicate these aggregation-prone intermediates, folding is favored. Therefore, it is desirable to keep the initial un-folded protein concentration to a minimum level for achieving higher and correct refolding outputs. Apart from this, by avoiding the hydrophobic inter-molecular interactions during the first few steps of refolding, renaturation at high protein concentrations can be achieved successfully.

Techniques for protein refolding and activation

Dilution methods

Formation of the native structure of the protein can be achieved by diluting the protein-denaturant solution into a refolding buffer to support refolding instead of the aggregation pathway. However, this technique has considerable limitations if scale-up through huge industrial grade refolding vessels needs to be carried out, which generally makes this process expensive and cumbersome.

Protein refolding technology got a tremendous boost when the importance of adding the solubilized, denatured protein into the refolding buffer at very low concentration and slow rate was recognized [152155]. Under these circumstances, an appropriate knowledge of the folding kinetics of the target protein is a prerequisite. It has been shown, for example in case of bone morphogenetic protein-2 and lysozyme, that the addition of concentrated protein-denaturant solution at a slower rate avoids the accumulation of aggregation-prone folding intermediates [152, 154156]. Furthermore, during the refolding of proteins through pulse addition protocols, it has been suggested that to achieve maximum efficiency, 80% of the refolding yield should be reached, before adding the next pulse. Other factors which should be given due importance while following this strategy include increasing the concentration of the denaturant with each cycle, and the amount of protein added per pulse. These protocols should be standardized in batch cultures to minimize aggregation [136, 157].

Dialysis methods

Utilization of dialysis and diafiltration techniques, which involves gradual change of denaturing to native buffer conditions, convert solubilized and un-folded protein to its native structure [157, 158]. Mostly, these methods lead to more aggregation during refolding compared to the direct dilution method, as during dialysis, the protein has to pass through different concentrations of denaturant. In addition, non-specific adsorption of refolding protein to the dialysis membrane may also negatively affect the refolding yields. However, according to the requirements of the target protein, use of appropriate denaturant removal process generally leads to increased refolding yields even at high protein concentrations, as seen in the case of recombinant carbonic anhydrase as well as single-chain fragment variable (scFv) fusion protein refolding [159161].

Chromatographic methods

Size exclusion chromatography (SEC)

Buffer exchange for denaturant removal can also be carried out by performing SEC. Here, the denaturant-protein solution is injected into a refolding buffer pre-equilibrated column and the resultant refolded protein in the eluate fraction comes out at considerably higher concentration compared to the simple dilution technique, as seen during the refolding process of platelet derived growth factor, avian erythroblastosis virus E2 oncogene homolog 1 protein (ETS-1), bovine ribonuclease A and E. coli integration host factor [162165]. Depending upon the kinetics, refolding may get completed in the column or occur in the eluate fraction particularly for the proteins with slow folding kinetics.

Aggregate formation can be reduced by physical partitioning of the aggregation prone folding intermediates in the absorbent gel [164]. Apart form this, re-solubilization of already existing aggregates can be achieved by running the denaturant at slower rates [136]. For proteins e.g., lysozyme, which exhibits better re-folding yield during gradual denaturant removal process, elution is performed, preferably by using a reducing denaturant gradient [161, 166168]. An extra advantage of the SEC method is that during the refolding process, simultaneous purification of the target protein can be achieved [162]. Furthermore, some of the recent applications have shown the feasibility of using SEC for continuous processes of protein refolding (for example, using bovine alpha-lactalbumin as a model protein). The SEC has further been improved by coupling this process to ultra filtration and recycling units, which takes care of the carry over of re-solubilized aggregates formed during the refolding process [169].

Solid-support assisted protein re-folding

To avoid the unwanted inter-molecular interaction between aggregation-prone folding intermediates, the solubilized and un-folded protein is attached to a solid support prior to changing conditions from denaturing to native buffer composition. Under these circumstances, attachment requires a stable protein-matrix complex formation, which can withstand the presence of chaotropic agents, and on the contrary, protein should be able to easily detach after changing to native buffer conditions. Several binding matrices have been used in various combinations for supporting binding of the un-folded protein and combining the process of target protein renaturation and its purification due to its selective binding properties [170172].

Proteins with a naturally occurring charged patch in the un-folded chain can be refolded by first binding to an ion exchange resin followed by the employment of refolding parameters described by Li et al. [168, 173]. Protein refolding has been achieved by incorporating N-or C-terminal peptide tags, such as an anion exchange matrix binding denaturant-resistant glutathione-S-transferase [172]; metal ion binding hexa-histidine [168, 171, 174]; poly-anionic support binding hexa-arginine [175]; or the cellulose matrix binding domain [176]. Once the protein is associated with the matrix; techniques such as dilution, dialysis, or chromatographical buffer exchange can be used for refolding. The resultant protein can be separated from the matrix by using buffers with high ionic strength; with EDTA or through imidazole, depending on the type of protein-matrix interaction involved [168, 170172, 175177].

Hydrophobic interaction mediated refolding

Hydrophobic Interaction Chromatography (HIC) involves attachment of the protein’s hydrophobic region to the HIC matrix leading to the formation of micro domains and thereby allowing the native structure formation in its vicinity. At high salt concentrations, un-folded proteins attach to the hydrophobic matrix and facilitate refolding. Attainment of the native conformation of the un-folded protein is regulated by salt concentration and hydrophobicity of the intermediate(s), through several steps of adsorption and desorption during its migration in the column. Protein impurities are removed simultaneously, while the renaturation process is going on during HIC [178180].

Strategies for production of active proteins through fusion—Group-III

Target recombinant proteins cannot always be obtained in a soluble form by the strategies outlined above. For proteins with a tendency to go into insoluble fractions, it has been shown that, metabolic engineering through fusion protein technology usually leads to its soluble expression.

Fusion protein production

In order to simplify the expression, solubilization and purification of recombinant proteins, a wide range of protein fusion partners have been developed. These fusion proteins or chimeric proteins usually include a partner or “tag” which may or may not be linked to the target protein by a recognition site-specific protease. Most fusion partners are utilized for the purpose of specific affinity purification. These fusion tags offer additional advantages by protecting the partner protein from intracellular proteolysis [181, 182], enhance solubility [183185] or they can be used as specific expression reporters viz. green-fluorescent protein (GFP) independently or in combination with blue-fluorescent protein (BFP) for performing fluorescence resonance energy transfer (FRET) assays [186, 187]. High expression levels often transfer from an N-terminal fusion tag, to a poorly expressing partner, most likely due to mRNA stabilization [188].

Common affinity tags are the poly-histidine tag (His-tag), which are compatible with immobilized metal affinity chromatography (IMAC) and the glutathione S-transferase (GST) tag for purification through glutathione based resins, as discussed earlier. Several other affinity tags exist and have been extensively studied [189191]. Fusion partners of particular interest with regard to optimization of recombinant expression, include the E. coli maltose binding protein (MBP) and E. coli N-utilizing substance-A (NusA). MBP (40 kDa) and NusA (54.8 kDa) are specifically utilized to increase the solubility of proteins, which tend to form inclusion bodies. MBP is considered to be much better than GST and thioredoxin proteins with respect to its solubility enhancing properties. Comparatively, NusA provides high expressivity as well as good solubility characteristics, the properties which are considered as major advantages for the target protein expression [192195].

Recently, a highly soluble N-terminal fragment of translation initiation factor, (IF2N, 17.4 kDa), has been utilized as a good solubility enhancing partner [185]. Utilizing such smaller fusion-tags often reduces the amount of energy required to obtain a certain level of expression by reducing steric hindrance. However, the result of fusion to a solubility tag may not necessarily lead to prevention of the inclusion-body formation, since it is more or less a target protein specific phenomenon.

Role of site-specific proteases

The recombinant purified protein is separated from its fusion partners such as affinity tags, solubility enhancers or expression reporters by the use of site-specific proteases. These enzymes cleave at recognition sequences present in various expression vectors. Selection of a specific protease and its optimal cleavage conditions mostly depend upon the amino acid sequences of target protein. Therefore, fusion tags with protease cleavage sequences similar to the one present in recombinant protein should be avoided.

Two serine proteases, namely factor Xa and thrombin, are widely utilized for site-specific fusion protein cleavage. Thrombin’s recognition sequence is LVPR/G and Factor Xa recognizes the amino acid sequence IEGR/X for cleavage, (X denotes any amino acid except arginine or proline). Enterokinase, which recognizes DDDDK/X (X can be any amino acid except proline); Precision Protease (Amersham Biosciences), which cleaves at LEVLFQ/GP and the Tobacco Etch Virus protease (TEV), which cleaves at the sequence ENLYFQ/G, are a few examples of more specific proteases used for the intracellular processing of fusion proteins [196199].

Perspectives and prospective

Recombinant proteins expressed as inclusion bodies in E. coli have been widely used for the production of therapeutic proteins in an industrial setting. New technologies have become available, which allow the production of biologically active proteins through a diverse array of in vitro refolding procedures. Despite these advances, production of recombinant proteins in a soluble form remains the method of choice for a Protein Biochemist.

How the biotechnology strategies, outlined in this review can be fully optimized during the scale-up of therapeutically important proteins remains to be seen. Industrial production of enzymatic proteins, antibodies, hormones, and other pharmaceutical biotech products in various expression systems still remains “an art.” At least modern technologies and know-how now seems to have unraveled some of these mysteries with respect to the production of biologically active proteins. The vast accumulation of knowledge of upstream expression and downstream purification protocols has certainly helped us improve the quality as well as yield of the target protein in a desired manner. Hopefully, in the near future, we will be able to “tailor made” the overexpressing recombinant protein in a prokaryotic expression system, and also scale up its production from lab to the industrial level with minimum trouble shooting efforts.