Introduction

Approximately 50% of all new medicines are classified as biopharmaceuticals; the demand for innovative protein-based treatments continues to soar because proteins of the human immune system have shown better efficacy in the fight against disease, compared with chemical-based drugs (Bellott 2010). At the same time, the dawn of the post-genomic era has led to an avalanche of new amino acid sequences waiting to be translated and defined in terms of biological structure–function (Eisenberg et al. 2000; Zhou et al. 2010). Consequently, both industrial and academic research laboratories face increasing pressure to produce pure proteins in sufficiently large quantities at reasonable cost. The recombinant DNA technology platform has been pivotal in allowing proteins to be produced at significant amounts for clinical and structural studies (Shuler and Kargı 2002; Leong and Chen 2008). Escherichia coli is one of the most commercially viable host organisms for protein expression because it offers the advantage of speed, where high growth rates are possible on inexpensive and simple media (Demain and Vaishnav 2009). However, over-expression of heterologous proteins in E. coli often results in the formation of insoluble proteins, more commonly known as inclusion bodies (IBs), due to changes in kinetic competition between folding and aggregation caused by the higher rate of protein synthesis and insufficient supply of chaperones to support correct protein refolding (Vonrhein et al. 1999; Georgiou and Valax 1996). As a matter of fact, only 13% of human-derived proteins expressed in E. coli were found in the soluble form in a previous study (Braun et al. 2002). Therefore, in vitro refolding of these insoluble proteins is necessary to restore biological activity. Although in vitro refolding techniques are well-established, refolding yields are still often sub-optimal due to off-pathway aggregation that competes favourably with the desired unimolecular refolding reaction. Designing a high-yielding refolding-based bioprocess is therefore not a trivial task and can be a time-consuming affair when the search for an optimised refolding environment hinges on trial and error approaches. Therefore, the development of rational approaches for protein refolding that can be robustly applied across a wide range of protein candidates is critically needed. Since aggregate formation in protein refolding is dependent on an additive-modified refolding buffer physicochemical environment, development of refolding analyticals based on in situ monitoring of refolding yield with varying refolding buffer composition will be beneficial to facilitate rapid and rational optimisation of the refolding buffer composition. Until now, the use of trial and error methods to develop refolding recipes prolongs methodology development of in vitro refolding tasks.

In this article, recent efforts to minimise inclusion body formation are first reviewed, followed by a summary on the design rationale of existing protein refolding technologies to increase refolding yield. In the final sections, we discuss how light scattering methodologies can be used for developing rational process designs and recipes for protein refolding. Although light scattering methods are frequently employed by structural biochemists to determine conditions conducive for protein crystallisation, its true potential in protein refolding is yet to be firmly established. Compared to other excellent reviews on protein refolding which have mainly focused on refolding technologies and/or mechanistic studies of buffer additives on protein behaviour, this review presents a systematic investigation of the applicability of light scattering technology in rationally designing refolding buffer compositions and controlling large-scale refolding processes.

Inclusion body formation: can it be avoided?

E. coli is one of the most highly utilized microbial hosts for over-expression of foreign proteins due to its well-characterised genome that eases targeted genetic manipulation to give high protein expression yields. However, despite a three-decade history of use for protein production, there is still a large fraction of proteins that cannot be produced in E. coli in its soluble and hence functional form. There are many hypothesized reasons for insoluble expression, and concerted research efforts to understand protein translation and folding in E. coli have been deployed to address protein insolubility problems. Firstly, over-expression of foreign proteins is believed to overwhelm the bacteria with a high metabolic burden (Hoffmann and Rinas 2004), leading to insufficient supply of chaperones to guide correct folding of the polypeptides. Efforts to co-express endogenous E. coli chaperones such as the DnaK-DnaJ-GrpE chaperones and the GroEL-GroES system during protein expression (Nishihara et al. 1998; Kirschner et al. 2007; de Marco et al. 2007) showed varied levels of success in terms of solubility enhancement for different protein types. The use of chaperones, however, cannot eliminate the occurrence of intracellular proteolytic cleavage, soluble aggregate formation and cellular growth inhibition, which is highly ‘protein sequence’ dependent. Therefore, chaperone co-expression alone cannot guarantee protein solubility (Martínez-Alonso et al. 2010). Secondly, the high proportion of E. coli rare codons in foreign genes can also be problematic. Extensive efforts to optimise rare codon usage in E. coli has helped to improve soluble protein expression (Kleber-Janke and Becker 2000; Burgess-Brown et al. 2008; Chuan et al. 2008b; Zhang et al. 2009), but the success of the codon optimisation strategy is again protein-specific because transcriptional pauses necessary for efficient protein folding in vivo vary with rare codon frequency and composition (Purvis et al. 1987; Krasheninnikov et al. 1988; Marin 2008; Komar 2009). Thirdly, the reducing environment of the bacterial cytoplasm does not promote formation of disulphide bonds necessary for correct folding of the protein. To overcome this problem, new bacterial strains like FA113 and DR473 are engineered to lack thioredoxin and glutathione reductase activities and can now accumulate proteins in the oxidised form (Mansell et al. 2008; Masip et al. 2004).

Manipulation of protein translation rate using low temperature fermentation conditions has been extensively reported, and success has again been highly sequence dependent (Vasina and Baneyx 1997; Fang et al. 1999; Phadtare and Severinov 2005). This outcome is expected considering that protein translation rate decreases in concert with the metabolic rate of the bacteria at reduced temperature, thereby allowing sufficient time for the protein to fold within the bacterial cytoplasm. Fusion protein technology is also commonly used to overcome protein solubility problems and has a higher predictability rate for soluble protein expression than temperature tweaking, especially with the use of fusion tags such as maltose binding protein and thioredoxin A (Bryksa et al. 2006; Pazgier and Lubkowski 2006; Xu et al. 2007; Huang et al. 2008; Huang et al. 2009). However, from a bioprocessing perspective, increased cellular metabolic burden of fusion protein co-expression coupled with inefficient downstream cleavage yields can significantly decrease product yield and increase overall process cost, thus nullifying the effects of increased upstream expression.

Despite the concerted and intensified efforts to enhance soluble protein expression in E. coli, the ability to “guarantee” soluble protein expression still cannot be achieved. Until now, there seems to be no universal solution to increase protein solubility through any of the abovementioned strategies (Fig. 1). This failure leads naturally to the question of whether tackling the protein insolubility problem in vitro could be the more productive way forward. IB formation can be advantageous in many respects. Apart from the high product yields, resistance towards proteolytic cleavage, minimised product toxicity towards the host cell (Greenshields et al. 2008) and the relative ease of purification of the target protein from other contaminants are strong bioprocessing advantages (Fahnert et al. 2004). Considering that IB formation would continue to feature as an indispensible part of recombinant protein synthesis, the development of rational methods to design protein refolding processes is of critical importance, which forms the basis of this mini-review.

Fig. 1
figure 1

Common strategies to avoid expression of proteins as insoluble IBs and the disadvantages associated with them

Design rationale of refolding methods

To complete the journey from the solubilised (denatured-reduced) state to its native conformation, a protein molecule needs to move down the funnel shaped rugged energy landscape to the lowermost point (Dobson 2003; Radford 2000). At each point within the energy landscape, aggregation-prone protein refolding intermediates remain prone to off-pathway reactions leading to second or higher-ordered aggregation reactions. Hence, the chief objective of any refolding process would be to create an environment that can direct the refolding reactions towards the first-order refolding pathway to form the native product. Unfortunately, since the same physicochemical forces (viz. electrostatic interactions, hydrogen bonding, hydrophobic interactions and disulphide bridging) which are needed to refold a protein are also involved in unproductive inter and intra-molecular protein interactions, protein refolding remains a challenging affair. Intuitively, aggregation can be controlled by manipulating protein concentration during the refolding step and choosing an optimum refolding buffer composition to promote native protein formation (Tsumoto et al. 2003). The design of current refolding strategies has been effective in tackling the concentration problem. Since excellent reviews of the design and operation of different refolding strategies have been published (Middelberg 2002; Jungbauer and Kaar 2007), we will only briefly summarise the rationale of conventional protein refolding strategies in optimising protein concentration for refolding.

Dilution

Refolding by dilution lowers refolding protein concentration to minimise the propensity of inter-molecular interactions of protein refolding intermediates. Despite the low protein concentration, refolding buffer composition still plays an important role in influencing protein refolding kinetics (Leong and Middelberg 2007).

Dialysis

Dialysis is employed in favour of dilution when (a) the starting concentration of the denatured-reduced protein is too low for further dilution into the refolding buffer and/or (b) a complete buffer exchange is required (i.e. no carry-over of denaturants or reducing agents in the refolding buffer). However, dialysis refolding suffers from the disadvantage of protein loss to dialysis membranes, and slow buffer exchange kinetics often can induce aggregation of the refolding intermediates (Leong and Middelberg 2006). The use of multiple step changes in buffer exchange to increase the ‘steepness’ of the folding energy landscape has been successful in improving refolding yields (Tsumoto et al. 2010).

Chromatography

Chromatography-based refolding was developed to improve refolding productivity without compromising refolding yield. Immobilization of protein molecules on chromatographic matrices helps to spatially isolate the proteins from each other (Schmoeger et al. 2010), thereby reducing intermolecular interactions of the protein folding intermediates. Non-native intra-molecular interactions, however, can still promote aggregation of the immobilised protein molecules, but optimal buffer composition can help in maximising the refolding yield (Chen and Leong 2009; Langenhof et al. 2005). The development of chromatographic materials and methods for bioprocess intensification has also been extensively studied (Li et al. 2004; Jungbauer et al. 2004; Kaar et al. 2009; Chen and Leong 2010; Li and Leong 2011; Liu et al. 2009).

High hydrostatic pressure technology

Another way to promote native refolding reactions is to create a refolding environment that can destabilise off-pathway aggregation. High hydrostatic pressure technology was developed with this strategy in mind, where hydrostatic pressure is applied to ‘ensure’ that only proteins refolded to the native form and hence having the lowest specific volumes are stabilised (Malavasi et al. 2011; Seefeldt et al. 2009). Application of pressure can also help to solubilise the IBs directly and refold the target protein spontaneously. However, modulation of hydrogen bonds and disulphide bridges needed for solubilising the IBs and obtaining the correctly refolded product can only be achieved by an optimised buffer system (Qoronfleh et al. 2007).

As evident from Table 1, the design of these refolding methods is focused mainly on controlling refolding concentration to obtain high refolding yields, but optimisation of the physicochemical environment still needs to be independently performed for each of these methods. In recent years, high throughput screening of refolding buffers for protein refolding based on protein solubility has been reported (Vincentelli et al. 2004). Some of these screening kits have been commercialized; for example, Refold Master™ developed by Novexin Ltd (Cambridge, UK) comprises a matrix of different buffer compositions which allows screening of protein foldability in 96-well plate formats. These screening kits, however, do not allow in situ monitoring of the refolding reaction, which can provide important insights into the influence of different refolding additives on refolding yields. The major hurdle for such monitoring is because different proteins require different detection strategies (Table 2) and a generic screening platform is not practically possible. It would, therefore, be considerably more advantageous if new screening methods could be developed to simultaneously provide quantitative information on aggregation kinetics under different refolding environments, thus providing a better correlation between physicochemical parameters and refolding yield for a given protein. In the following sections, we investigate how light scattering methods can be used not only as valuable tools for measuring such data but also to develop rational process designs and recipes for protein refolding.

Table 2 Detection methodologies used for screening protein refolding conditions
Table 1 A brief overview of existing refolding technologies

Light scattering methods: a rational approach to designing protein refolding recipes

The random movement of protein molecules during refolding is influenced by the inter-particle and hydrodynamic interactions amongst them. This random motion leads to fluctuation in the light intensities scattered from the protein molecules in solution when irradiated by an incident light wave. The use of static light scattering to study the behaviour of protein molecules in solution was first reported in the 1990s, where it was concluded that that successful crystallisation takes place only when the second virial coefficient (SVC) of the solutions lie within a specific value (George and Wilson 1994). The ability to quantitatively predict environments conducive for protein crystallisation indicates that the light scattering platform can also be instrumental for studying protein–protein interaction within a refolding system by providing thermodynamic information on protein phase behaviour. Before discussing the applications of light scattering in the rational design of refolding recipes, we will first briefly review the principles of static and dynamic light scattering to set the context of our discussion.

Static light scattering

Static light scattering (SLS) is primarily used to measure the SVC of protein solutions. SVC is a thermodynamic parameter which originates from the virial expansion of solution osmotic pressure (Neal et al. 1998). According to statistical mechanics, the SVC (B 2) can be expressed as an integration of the potential of mean force, w 2(r), between two molecules (Eq. 1):

$$ {B_2} = - \frac{{2\pi {N_A}}}{{M_w^2}}\int\limits_0^{\infty } {\left( {{e^{{ - {{{{w_2}(r)}} \left/ {{kT}} \right.}}}}} \right){r^2}{\rm d} r} $$
(1)

where N A stands for the Avogadro’s constant, M w is the molar mass, k is the Boltzmann constant and r is the centre-to-centre distance between two molecules (Pan and Glatz 2003). Therefore, SVC takes into account all the interactions that regulate protein phase behaviour such as electrostatic, van der Waals, excluded volume, hydration forces and hydrophobic interactions (Valente et al. 2005). SVC values of protein solutions are determined using a Debye plot according to Eq. 2 (Stockmayer 1950):

$$ \frac{{Kc}}{{R\left( \theta \right)}} = \frac{1}{{{M_w}}} + 2{B_2}c $$
(2)

where R(θ) is the excess Rayleigh scattering of the protein solution, K is the light-scattering optical constant and c is the protein mass concentration. K depends on the solution’s scattering properties and is defined as:

$$ K = \frac{{4{\pi^2}{{\left[ {{n_s}\left( {{{{{\rm d} n}} \left/ {{{\rm d} c}} \right.}} \right)} \right]}^2}}}{{{N_A}{\lambda^4}}} $$
(3)

where λ is the wavelength of the incident light, n is the refractive index of the protein solution and n s is the refractive index of the solvent. Qualitatively, SVC measures two-body interactions in dilute solution conditions (i.e. protein–protein interaction in the case of protein refolding). A positive value indicates a repulsive interaction amongst the protein molecules, implying that protein–solvent interactions are favoured over protein–protein interactions while a negative value indicates the reverse (George and Wilson 1994). Experimentally determined SVC values have thus been established to correlate well with protein solubility (Guo et al. 1999; Ruppert et al. 2001). Since refolding yield is dependent on the extent of aggregation that occurs during refolding and the latter is a function of the physicochemical parameters of the refolding buffer, optimisation of the refolding buffer based on SVC measurements can help in developing rational strategies for designing the optimum refolding buffer recipes.

Dynamic light scattering

Whilst inter-particle interactions can be determined from SVC measurements, hydrodynamic interactions on the other hand can be measured by dynamic light scattering (DLS). Hydrodynamic interactions arise because a moving molecule induces solvent flow and hence exerts viscous forces on diffusing protein molecules nearby (Liu et al. 2005). Thus, when a second protein molecule comes in contact within the same flow field, it experiences attraction or repulsion due to the hydrodynamic interactions. The net interacting forces amongst the protein molecules can be quantitated by DLS using Eq. 4, which indicates that the mutual diffusion coefficient (D m) of solute molecules under a given solvent condition is dependent on protein concentration (Meechai et al. 1999):

$$ {D_m} = {{{\left( {{{{{\rm d} \rho }} \left/ {{{\rm d} c}} \right.}} \right)}} \left/ {{{f_m}}} \right.} $$
(4)

where dρ/dc is the ‘protein concentration’-dependent increase of the solution osmotic pressure, indicating the thermodynamic driving force for diffusion, while f m is the concentration-dependent friction factor derived from the resistance to collective translation of particles due to the potential and hydrodynamic interactions amongst them. Expressing the protein solution concentration in terms of volume fraction ϕ, Eq. 4 takes the form shown below (Liu et al. 2005)

$$ {D_m} = {D_0}\left( {1 + {k_m}\varphi } \right) $$
(5)

where D 0 is the infinite dilution diffusion coefficient while k m is the interaction parameter between the solute particles, obtained from the slope of the normalised diffusivities (D m/D 0) versus volume fraction plots (Grigsby et al. 2000; Mirarefi and Zukoski 2004; Rubin et al. 2010; Liu et al. 2005). Just like the SVC values, attractive or repulsive inter-particle interactions are reflected by negative or positive k m values, respectively. This interaction term is given by Eq. 6:

$$ {k_m} = {k_d} + {k_h} $$
(6)

where k d and k h represent the interaction potential and hydrodynamic interactions between particles, respectively. A detailed interpretation of k d and k h is presented elsewhere (Muschol and Rosenberger 1995), but for the purpose of this review, it is reasonable to consider that k d can be obtained from the following equation (Liu et al. 2005):

$$ {k_d} = \frac{{2{B_2} \times {M_w}}}{v} $$
(7)

where v is the partial specific volume of the protein (i.e. ratio of the volume fraction to the protein concentration). Diffusivity values obtained from the DLS data can also provide information on the hydrodynamic diameter of the refolding protein molecules using the Stoke–Einstein equation (Liu et al. 2005), which gives indications of aggregation kinetics at a given time.

The combination of static and dynamic light scattering data is therefore highly effective in identifying the type of interactions prevalent during protein refolding reactions. As summarised in Fig. 2, SLS data can provide SVC values, which in turn can be used to determine k d (using Eq. 7) for any given system at any given point of time (Fig. 2a). Similarly, DLS data can be used to determine the normalised diffusivities (D m/D 0) to obtain k m values using Eq. 5 (Fig. 2b). Thus, by knowing the values of k d and k m for any given time point, one can easily calculate the corresponding k h values using Eq. 6 (Fig. 2c). The combination of DLS and SLS data can thus provide a vivid picture of the types of interaction encountered by protein refolding intermediates at any given time point, thereby providing vital information necessary that will guide the control of a protein refolding process. In the next section, we discuss how light scattering experiments can be used to design smarter protein refolding processes in the future.

Fig. 2
figure 2

Strategy for quantifying the type of interacting forces amongst the protein refolding intermediates using light scattering data: a slopes from the Debye plots obtained from the SLS experiments, under different environmental conditions, can be used to determine the SVC for the protein solutions (from Velev et al. 1998); b slopes from the diffusivity plots, obtained from DLS experiments, can be used to obtain the k m values for the different protein intermediates at different conditions; c combination of DLS and SLS data helps to determine the hydrodynamic and interaction-potential of the interacting moieties at any given point of time within the protein refolding system (from Liu et al. 2005)

Future directions: using light scattering tools for rational design of refolding processes

Over the past decade, SVC and protein diffusivities have been increasingly used to obtain better insights into protein behaviour under different environmental conditions. Experimentally determined SVC values can shed light on methods to manipulate protein thermodynamic properties and phase behaviour (Curtis and Lue 2006; Rosenbaum and Zukoski 1996) and can even provide partial structural information (Neal et al. 1998). SVC values have thus been used as an unbiased estimator for protein-specific behavioural patterns to guide determination of optimal crystallisation conditions for proteins (Tessier and Lenhoff 2003; Velev et al. 1998). In view of protein refolding applications, SVC values can accurately correlate the relationship between refolding additives and protein interactions (Kulkarni et al. 2000; Kulkarni and Zukoski 2001; Curtis et al. 2002; Liu et al. 2004) and has been exploited to predict (a) the propensity for aggregation during protein refolding (Ho et al. 2003) and (b) potential refolding yields of inclusion body proteins (Ho and Middelberg 2004). Therefore, SVC measurements of refolding mixtures carried out at statistically determined design points can rapidly provide quantitative correlations between the choice and concentration of additives on the interactive behaviour of proteins (Valente et al. 2005; Basu et al. 2011). In addition to SLS, other methods have also been used to determine SVC (Tessier and Lenhoff 2003), amongst which static interaction chromatography has gained considerable popularity both in the conventional chromatography (Valente et al. 2005) and microfluidic device-based platforms (Deshpande et al. 2009; Martin and Lenhoff 2011). However, with the advent of the avalanche photodiode for recording scattered intensity signals, modern light scattering instruments allow fast determination of accurate SLS and DLS data within a single setup (Yadav et al. 2011). By monitoring protein interactions under varying pH, temperature and ionic strengths through diffusivity measurements (Grigsby et al. 2000), it is possible to determine equilibrium association constants (Kuehner et al. 1997), which can be used to predict protein stability within a given refolding environment. Therefore, modern light scattering instruments can be exploited to obtain combined DLS and SLS data to monitor protein inter-particle and hydrodynamic interactions in order to pinpoint the type of forces involved amongst interacting protein molecules during refolding (Prinsen and Odijk 2007).

Therefore, it is reasonable to say that the stage is set to exploit light scattering technologies for better monitoring and controlling of the refolding processes. Although there are a few studies on monitoring protein refolding reactions using SLS or DLS data (Crisman and Randolph 2009; Gast et al. 1997), development of an analytical to control refolding processes using this strategy is yet to be realised. Since DLS has been successfully used to develop titration based strategies for crystallisation of a target protein (Mirarefi and Zukoski 2004), there remains little doubt about the potential of light scattering methods for controlling protein refolding processes as well. The schematic shown in Fig. 3 provides a glimpse of a potential solution for designing a rational protein refolding process, where data from an online light-scattering detector can be fed into a controller that can keep the refolding process under a desired environment (i.e. maintain the critical thermodynamic parameters like the SVC and k m within a specified range). The controller in turn can be loaded with information regarding changes in protein behaviour under different environmental conditions, quantified off-line by monitoring the vital thermodynamic parameters at statistically determined design points. For better quantitative interpretation of light scattering data, it is essential to segregate the different aggregating species within the analysed samples (George and Wilson 1994), and therefore, the presence of a suitable separating system prior to the light-scattering detector can prove beneficial. In this regard, field-flow techniques like the asymmetric flow field-flow fractionation (AF4) which has the capability to segregate molecules based on their diffusivities (Roda et al. 2009) can be relevant. Applicability of AF4 over a wide size range up to 500 nm (Cao et al. 2009) has already made it a potential fractionating methodology for protein bioprocessing (Chuan et al. 2008a; Luo et al. 2006).

Fig. 3
figure 3

A schematic representation of a process design which can allow better controlling of a protein refolding process. Samples from the protein refolding reactor can be fed directly into a light-scattering detector via a suitable separating unit. The data from the detector can be fed directly into a suitable controller that can help to maintain the desired thermodynamic parameters of the ongoing reaction within a specific range

In summary, the process design schematically depicted in Fig. 3 can help to enhance our understanding and control of any refolding process, thereby qualifying it as a potential candidate within the arsenal of process analytical technology (PAT) tools used for protein refolding processes (Read et al. 2010). Therefore, in addition to the dissolved oxygen, currently being used as a PAT tool for this process (Pizarro et al. 2009), SVC and diffusivity data can be highly beneficial for rational design of protein refolding recipes and platforms.

Concluding remarks

To maximise refolding yields, two factors must be optimised: (a) protein concentration and (b) physicochemical environment of the refolding buffer. The design rationale for existing refolding platforms has been mainly targeted at optimising the former factor, while little progress has been made in tool development to rapidly optimise the latter. Huge amounts of information on the impact of different types of additives on the refolding reactions are now available from many earlier studies, from which different refolding buffer systems can be designed. However, the lack of quantitative data on aggregation kinetics to complement solubility data which is often used to infer protein foldability in existing refolding screening methods increases the difficulty in design and scale-up of refolding processes. The use of ‘light scattering’-based analyticals in a high throughput format has the advantage of qualitative screening for solubility as well as quantitative determination of aggregation behaviour based on protein–protein interaction, leading to reduced time to design optimum refolding recipes. The successful development of such tools will underpin protein bioprocessing by significantly reducing the time and cost for developing high-yielding refolding processes and expedite delivery of proteins for a wide range of applications.