Introduction

Proteins have been considered important therapeutic entities since the early 1900s when major resources were only available from plants and animals. With the advent of recombinant DNA technology in the 1970s, however, it was discovered that recombinant protein therapeutics could be produced by E. coli in a robust and economic manner. In the early 1980s, the FDA approved the first E. coli-produced recombinant insulin for diabetes treatment, opening the door and creating a model for developments of other recombinant therapeutics. Since then, in addition to E. coli, different expression hosts such as yeast, filamentous fungi, insect cells and mammalian cells have become available for producing different or more complex recombinant therapeutics such as monoclonal antibodies (mAbs).

Today, more than 151 unique recombinant therapeutics have been approved by the FDA and/or by the European Medicines Agency for different clinical indications. One-third of these approved protein therapeutics are produced in E. coli, indicating that it is a major workhorse for recombinant therapeutic production (Table 1) [15, 25, 29]. Compared to other recombinant microorganisms, E. coli remains the most attractive because of its well-characterized genetics, versatile cloning tools and expression systems, and the fact that it has been successfully used to express vastly different proteins. It is also advantageous to use E. coli for industrial scale production because of its rapid growth, low-cost media, ease of scale-up and capability to produce therapeutics with high yield and quality.

Table 1 Recent approved protein therapeutics using E. coli as an expression host

However, there are also some limitations to using E. coli as an expression host. These include the inability to perform certain posttranslational modifications (such as glycosylation) and insufficiencies in proteolytic protein maturation and disulfide bond formation [48]. Such drawbacks prohibit E. coli from expressing some complex and important therapeutics, such as mAbs, where both correct folding and glycosylation play crucial roles in their biological activities. Nevertheless, E. coli continues to be used to produce other important recombinant therapeutics. In this review, we will discuss important considerations for producing protein therapeutics in E. coli, with an emphasis on commercial production. Recent advancements in protein expression in E. coli, such as complex protein production, bacterial N-linked glycosylation, novel strain engineering and creation of E. coli cell-free systems, will also be covered.

Expression considerations for recombinant therapeutics in E. coli

Host

Escherichia coli K12 and its derivatives are the main strains used in recombinant therapeutic production in the biotech industry. A big advantage using E. coli K12 was given by the National Institutes of Health when it made this strain the standard and provided guidelines for safety. In addition, large-scale industrial production with E. coli requires approval by the local Biosafety Authority, which may be reluctant to approve other E. coli strains without the same safety level as E. coli K12. Common K12 derivatives used in the biotech industry include E. coli RV308 and W3110 [14, 28].

To further improve recombinant protein production, several studies have genetically modified K12 strains to reduce acetate accumulation during the cell growth. Acetate is an undesirable by-product formed during aerobic fermentation, and its accumulation at a high concentration has negative effects on cell growth and recombinant protein production [19]. Reducing glucose consumption and redirecting carbon flow from acetate formation pathway are the two major strategies used to engineer low acetate production strains. For example, by deletion of the pstHI operon in E. coli GJT 001, Wong and coworkers increased protein production yield by 25-fold in a batch bioreactor. This high productivity was attributed to low acetate accumulation, although cell growth was compromised [124]. The E. coli B strain was first designed and engineered by Studier and Moffatt, and has been a model strain for studying phage sensitivity, restriction-modification systems, bacterial evolution and recombinant protein expression in laboratories as well as in the biotech industry [100]. Since its original use, the B strain has been modified and engineered to create additional strains, such as BL21, C41 and C43, that have different capabilities to express varying types of proteins. B strain and its derivatives have several advantages such as low acetate accumulation under high glucose concentrations, specific protease deficiencies and high outer membrane permeability, making them desirable hosts for expression of protein therapeutics [82, 107, 112].

Some E. coli strains have also been designed to tailor for special needs. Recently, Caparon and coworkers have shown that deletion of fours genes, dppA, oppA, malE and ompT, of the E. coli strain BC50 can significantly reduce the level of host cell contaminants in recombinant Apolipoprotein A 1 Milano production, with no adverse effects on fermentation productivity [10]. To improve the secretion of periplasmic recombinant proteins, an E. coli strain with compromised outer membrane structure has been engineered to increase the extracellular secretion of antibody fragments [72]. Details of examples of novel strain engineering for improved therapeutic production will be discussed in the latter sections of this review.

Vector design

Optimal gene transcription is normally a function of both gene dosage (plasmid) and promoter functionality. The productivity of recombinant protein is known to be affected by plasmid copy number and its structural and segregational stability. Choosing the optimal plasmid copy number is critical. Too low of a copy number will result in a low mRNA pool, as well as low protein productivity. A high copy number generally leads to high productivity; however, it also tends to impose metabolic burdens on cells. The plasmid copy number depends largely on the replication of origin, which dictates either flexible or rigid control over a plasmid. Both high copy number plasmids (e.g., pUC, 500–700 copies) and medium copy number plasmids (e.g., pBR322, 15–20 copies) have been used for therapeutic production in E. coli [14, 38].

For large-scale therapeutic production, the use of high copy number plasmids is not desirable. First, in order to maintain high copy number plasmids in cells, selection markers such as antibiotics are usually required and included in the growth media. However, the FDA discourages the use of antibiotics in clinical manufacturing, as they cause allergic reactions in some patients and raise concerns over the development of drug-resistant pathogens. Second, high copy number plasmids have been shown to possess higher segregational instability, especially in the absence of antibiotics [7, 27]. This leads to the overgrowth of plasmid-free cells, resulting in a significant loss of protein productivity. Lastly, production and maintenance of high copy number plasmids in cells require tremendous energy, which will inevitably reduce cell growth and protein synthesis. Therefore, high copy number plasmids are normally used for the expression of recombinant therapeutic genes for gene therapy [11].

An ideal expression system is critical for high-level therapeutic production in E. coli to allow tightly regulated and efficient transcription. Choosing an appropriate vector system is largely dictated by the strength and the control of its promoter. To ensure high-level expression, a promoter with certain characteristics must be incorporated into the plasmid [38]. For example, the promoter has to be strong to allow the recombinant protein production to account for 10–30% or more of total cellular protein. It should also be tightly regulated with limited basal expression in the non-induced state. Leaky expression can cause metabolic burdens on the cells during the growth period by diverting the carbon and energy source to premature protein formation. This situation can be detrimental when the expressed protein is highly toxic. In addition, some promoters must be used within specific E. coli strains to achieve optimal protein expression. Other important considerations are that the induction method should be simple and cost-effective, and, in most cases, the induction must be independent of the media components.

Multiple promoters have been successfully used to produce different recombinant proteins in the past (Table 2). Among them, lac and its derivatives, tac and trc, are commonly used both in basic research and industry [29, 38]. The synthetic tac and trc promoters are stronger than lac, and all of them are induced by isopropyl-β-D-thiogalactopyranoside (IPTG). Since IPTG is used to derepress the lac repressor, the expression level of recombinant protein is titratable by varying the IPTG concentration in the media. IPTG, however, is expensive and toxic to some E. coli strains. To circumvent these drawbacks, a thermosensitive lac repressor mutant is available to induce protein expression by shifting temperature instead of using IPTG [3].

Table 2 Commonly used promoters for protein therapeutics production in E. coli [38, 85, 111, 112]

T7 promoter from T7 bacteriophage is another powerful system that is widely used for recombinant protein expression. It is specifically recognized by T7 bacteriophage RNA polymerase, which can elongate DNA chains five times faster than E. coli RNA polymerase. Typically, the chromosome of the host strain contains a prophage (λDE3) encoding T7 bacteriophag RNA polymerase that is under the control of a lac promoter derivative, L8-UV5. This new version of the lac promoter is less dependent on the cyclic AMP level and is less sensitive to glucose repression. Therefore, the T7 system is generally considered a stronger protein expression system than the lac system. It can lead to very high recombinant protein expression, and production of up to 50% of the total cell protein has been reported, mostly in the form of inclusion bodies [7, 94]. Nevertheless, basal expression is a commonly found problem in the T7 system. Despite the fact that co-expression of T7 lysozyme using pLysS or pLysE plasmids can improve the control over the T7 promoter, expression of T7 lysozyme can also increase cell stress and reduce recombinant protein yield [107]. Although the T7 system can lead to a very high level of protein expression, it is not commonly used in the industry.

A thermo-regulated promoter system, such as bacteriophage lambda pL and/or pR promoters, has also been used for production of many therapeutic proteins. Inclusion of the temperature-sensitive cI857 repressor cassette in the plasmid allows control of the pL and pR promoter by changing the growth temperature. The cI857 repressor blocks transcription at 28–37°C and is inactivated at 42°C. Therefore, recombinant gene expression is induced when temperature is shifted to 42°C. Notably, many human therapeutics, such as interferon-γ, insulin, tumor necrosis factor and granulocyte colony-stimulating factor, have been successfully produced by this system with high yield [111]. This system is especially suitable for large-scale industrial production because it is highly productive, easy to operate and scale up, and, importantly, it minimizes culture contamination. On the other hand, high temperature induction may have other drawbacks, such as reduced cell growth and increased cell stresses (heat shock response and SOS response) in addition to the stress caused by the recombinant protein expression. Together, these stresses may compromise protein yield and quality. The lambda pL and/or pR promoter can also become constitutive at lower temperatures, making this system ideal for production of soluble or proteolytically susceptible proteins [70].

Nutritionally inducible promoters, such as phoA and trp, are also commercially available and used in recombinant therapeutic production. The phoA promoter has been used to successfully express full-length antibodies and antibody fragments in E. coli [97]. In these cases, media composition should be carefully designed in both batch media and the feed to ensure phosphate depletion and protein induction. Normally, the induction will occur over a broader range of time compared to other promoters [83]. Another robust and tightly regulated promoter is the lux system. By taking the quorum sensing elements from Vibrio fisheri, gene expression is controlled by the luxI promoter, and transcription is activated by addition of the autoinducer, Acyl homoserine lactone (AHL). AHL is typically used at 1/1,000 of the concentration of IPTG in other systems, making it a very cost-effective system at an industrial scale [108].

Protein translation

Protein translation in E. coli can be divided into four phases: initiation, elongation, termination and ribosome cycling. In most cases, translation initiation is a rate-limiting step for protein biosynthesis. The efficiency is determined by the sequence and structure of the translation initiation region (TIR) at the 5′ end of each mRNA [87, 90]. The TIR is composed of four different sequences: (1) the Shine-Dalgarno (SD) sequence, (2) the start codon, (3) the spacer between the SD and the start codon, and (4) translational enhancers located at the upstream of the SD and/or downstream of the start codon. Modulation of TIR sequences has been shown to improve or control recombinant protein expression in E. coli [1, 98, 123]. Vimberg et al. [119] demonstrated that a six-nucleotide SD (AGGAGG) sequence is more efficient than other shorter or longer SD sequences in expression of recombinant green fluorescent protein (rGFP). In this study, incorporation of an A/U-rich upstream enhancer further improved rGFP expression by 13-fold. Modifying the TIR region to avoid possible mRNA secondary structures that reduce the accessibility of the SD and/or the start codon is also critical to achieve optimal protein expression. For example, a single base mutation at the SD region could decrease the expression of RNA bacteriophage MS2 coat protein by 500-fold [103]. Exposing the AUG start codon from a base-paired mRNA structure significantly improved the translation efficiency of IL-10 in E. coli and resulted in a 10-fold increase in IL-10 production compared to the wild-type genes [133]. Recently, a biophysical model of translation initiation was developed that can design synthetic ribosome binding sites targeting different translation initiation rates (http://salis.psu.edu/software/). This technology enables rational control and fine-tuning in recombinant protein expression [87]. This precision is especially important for recombinant protein secretion as a high translation rate may overwhelm the secretory apparatus, leading to low production yields. In fact, for enhanced recombinant protein secretion in E. coli, optimizing, as opposed to maximizing, the translation level usually leads to high-level secretion of recombinant therapeutics [97, 98].

The stability (half-life) of mRNA also affects the protein expression rate in E. coli. Degradation of mRNA in E. coli is mediated by several different RNases, including endonucleases (RNase E, K and III) and 3′ exonucleases (RNase II and polynucleotide phosphorylase). The enzymatic activities of these RNases also depend upon the growth conditions [12, 38]. For high-level recombinant protein expression, two common strategies have been used to improve mRNA stability: the introduction of protective elements at two ends of the mRNA and the inhibition of RNase activities by strain engineering and by manipulating growth conditions [6]. It has been shown that adding the 5′ UTR of ompA can prolong the half-life of a number of heterologous mRNAs in E. coli by incorporating a stem loop structure at or near the 5′ terminus of the mRNA [67]. Fusing the transcription terminator of the penP gene from Bacillus thuringiensis to the cDNA of human IL-2 significantly enhanced the mRNA stability and improved IL-2 production in E. coli [67]. Lopez and colleagues showed that the C-terminal region of the rne gene, which encodes RNase E, is required for mRNA degradation. They found that mutation of E. coli rne can increase mRNA stability and improve recombinant protein yields [63]. Strains containing this mutation are currently commercially available.

In addition to the TIR sequence and mRNA stability, species-specific variation in codon usage (codon bias) between E. coli and other organisms can also affect protein translation. Expression of heterologous genes containing rare codons can lead to growth arrest, premature translational termination and increased frameshifts, deletions and misincorporations in the recombinant proteins, especially if the rare codons are clustered together [30, 31]. One method to overcome codon bias is to synthetically optimize the gene sequence with codons preferably used by E. coli. Several design algorithms are available to optimize protein expression in E. coli and other hosts [79, 118]. For example, the expression of a soluble scFv (against hepatitis B antigen) in the periplasm was increased more than 100-fold in a codon-optimized sequence compared to the original gene [109]. Supplementing cognate tRNA for rare codons is another effective strategy to remedy codon bias. The E. coli Rosetta (DE3) strain contains a plasmid (pRARE) encoding tRNA for codons that are commonly used by eukaryotes but rarely used in E. coli. Using this strain to express different human recombinant proteins, Tegel et al. [106] showed that protein yields were increased for 35 of the 68 tested proteins. This strain is especially more efficient at expressing proteins that are difficult to express in E. coli BL21 (DE3). The use of the pRARE plasmid is a practical method to improve the yield of heterologous proteins in E. coli and has been demonstrated in several studies [30, 33, 37].

Cytoplasmic expression

Inclusion bodies

Production of functional proteins in E. coli requires a delicate balance between DNA transcription, protein translation and protein folding. High-level expression of recombinant protein in E. coli can produce over 30% of total cellular protein, where folding chaperones and modulators are highly titrated. In addition, correct folding of many proteins requires disulfide bond formation and/or glycosylation, which are absent in the E. coli cytoplasm. Therefore, expression of many human therapeutic proteins in E. coli generates high amounts of unfolded and misfolded proteins in the cytoplasm, where they tend to aggregate and form inclusion bodies (IBs).

Despite the fact that IBs are biologically inactive, many commercial and developmental therapeutics, such as interferons, interleukins and Fc-fusion proteins, are produced as IBs because of the multiple advantages of these protein aggregates [29]. High-yield production and versatility to express different proteins are two advantages associated with IB formation [65]. IBs are also stable protein aggregates and are resistant to protease activities in vivo. In addition, proteomic analysis showed that IBs are relatively homogeneous in composition, and, in some cases, the recombinant protein can account for more than 90% of the total imbedded polypeptides [115]. During the downstream processing, IBs can be easily isolated after cell disruption, and the resultant IB paste can be stored frozen for several months, providing manufacturing flexibility. Together, these characteristics allow IBs to be produced at high yield, as well as isolated and purified with simple and minimal efforts. However, this method also has its downside. Especially, the refolding of IBs to active protein represents a challenge, because efficient and high-yield refolding requires considerable optimization for each target protein. Resolubilization of IBs using chaotropic agents may also affect the integrity of the refolded proteins [86]. Even so, acceptable recovery usually can be achieved at large industrial scale by using established strategies [18, 60].

Soluble proteins

Production of soluble and bioactive protein without the requirement of refolding can also be achieved in the E. coli cytoplasm. Two strategies, reduced growth temperature and improved folding environment, are commonly used to enhance the solubility of recombinant protein without changing its protein sequence. Lowering the growth temperature decreases the rate of protein synthesis and prevents accumulation of folding intermediates in the cytoplasm. It also decreases protein aggregation by reducing both inter- and intra-molecular hydrophobic interactions that facilitate IB formation [21]. This method has been shown to be effective in improving the solubility of a number of different proteins including interferon α-2, Fab fragments and human growth hormone (GH) [39, 114].

Cytoplasmic folding modulators, such as folding chaperones (e.g., DnaK and GroEL), holding chaperones (e.g., IbpA and B) and disaggregating chaperone (ClpB), play important but distinctive roles in maintaining correct protein folding. Coexpression of these chaperones has been reported to improve the solubility of different recombinant proteins [53]. For example, with coexpression of GroEL/GroES, 65% of total produced anti-B-type natriuretic peptide single chain antibody (scFv) was soluble, which is 2.4-fold higher than in the strain without coexpressed chaperones [66]. However, chaperone coexpression does not guarantee improved protein solubility. It could also cause an extra-metabolic burden to the cells, resulting in low protein productivity. Different E. coli strains have also been developed to improve protein folding and solubility in the cytoplasm. Disruption of two genes, txrB (thioredoxin reductase) and gor (glutaredoxin reductase), in E. coli improved the folding of recombinant tissue plasminogen activator by facilitating disulfide bond formation and isomerization in the cytoplasm. Overexpression of the periplasmic disulfide bond isomerase (DsbC) in the cytoplasm of the same strain further enhanced disulfide bond formation [8]. Despite all of the progress made, soluble expression of recombinant therapeutics in the cytoplasm still can lead to protein misfolding, relatively low protein yield, laborious downstream processing and susceptibility to proteolytic activities [29]. Therefore, if a soluble protein is desired, soluble periplasmic expression is the preferred route in industry.

Periplasmic and extracellular secretion

Periplasmic secretion

Secretion of recombinant proteins into the periplasm of E. coli offers an attractive approach to produce complex and/or large protein therapeutics. Disulfide bonds are introduced by the Dsb protein family in the periplasm, creating an oxidizing environment to facilitate proper folding of recombinant proteins. The cleavage of the signal peptide following the translocation is more likely to yield an authentic N-terminus of the expressed protein. In addition, compared to the cytoplasm, the periplasm has a lower concentration of host cell proteins and lower proteolytic activities, so purification can be easily implemented [67]. Therefore, a variety of different recombinant proteins is produced by periplasmic secretion, including commercialized Fab fragments (Leucentis® and Cimza®) and some clinical therapeutics, such as full-length aglycosylated antibodies and scFvs [45, 74].

Recombinant proteins are typically exported to the periplasm via one of three pathways that utilize the type II secretion system of E. coli: Sec-dependent, SRP-dependent and twin-arginine translocation (TAT) [71]. The targeting of recombinant protein to each pathway is normally determined by the secretion signal fused to the N-terminus of the protein. Different Sec-dependent secretion signals have been successfully used for recombinant therapeutic secretion, including OmpA, LamB, PhoA, STII, endoxylanase from Bacillus sp. and PelB from Erwinia carotovora [16]. For example, high-level periplasmic expression of granulocyte colony-stimulating factor (G-CSF) at 4.2 g/l has been reported by using the endoxylanase signal peptide, and the secreted insulin-like growth factor I (IGF-1) reached 4.3 g/l when utilizing the LamB signal peptide [43, 129]. In addition to the signal peptide, the efficiency of protein secretion also depends on the host strain, promoter strength, cultivation temperature and type of protein to be secreted. Therefore, in most cases, a trial-and-error approach is required to optimize protein secretion.

Despite many successful studies, high-level secretion usually overwhelms the translocation machinery, as well as folding capacity in the periplasm, leading to IB formation and misfolding of recombinant proteins. In these cases, coexpression of periplasmic chaperones such as disulfide bond oxidase (DsbA), isomerase (DsbC) and peptidylprolyl isomerase (Skp, FkpA and SurA) could improve the accumulation of the functional proteins [53]. Overexpressing DsbA and DsbC has been shown to increase the assembly efficiency of the light chain and the heavy chain of a full-length antibody in the periplasm from 15 to 95%, as well as to improve the production titer from 0.1 to 1.05 g/l [83]. Improved yield coupled with enhanced antigen binding affinity of anti-CD20 scFv was achieved when it was coexpressed with Skp [69]. Another feasible strategy is to enhance the solubility of a recombinant protein before its transport. This can be accomplished by coexpression of cytoplasmic folding chaperones (DnaK and DnaJ) to prevent aggregation, or export through the TAT system, which only recognizes folded proteins as substrates [17, 78].

Extracellular secretion

Extracellular secretion of recombinant proteins not only has the advantages inherited from periplasmic expression, it also displays additional attractive features in bioprocessing. Low native protein concentration, decreased protease activity and far less endotoxin in the extracellular space ensure easy purification and better protein quality. Secretion of recombinant proteins out of the periplasm also improves the production capacity and alleviates some stresses derived from overexpression of recombinant proteins. Several approaches have been explored to facilitate recombinant protein secretion in E. coli in the past, which include a secretory fusion partner from E. coli type I and V secretion systems. Hemolysin toxin (HlyA) can be directly secreted from cytoplasm to the extracellular milieu using the hemolysin transport system, which forms a protein channel across the inner and outer membranes [71]. By fusing the HlyA secretion signal at the C-terminus, recombinant IL-6 and scFv were successfully secreted into the culture medium at concentrations of 2 and 7 mg/l, respectively [16, 24]. Autotransporters, such as AIDA-1, have also been used for the secretion of recombinant proteins used in vaccines, such as cholera toxin and OspA, an outer membrane lipoprotein A of Borrelia burgdorferi [44]. To identify more secretory fusion partners, Qian and colleagues analyzed the secretome of E. coli BL21 and found 22 potential proteins, of which OsmY, an osmotically inducible protein, was the most promising. OsmY-Leptin was excreted into the culture medium at the concentration of 0.25 g/l, which was the highest titer among the screened proteins, in a fermentor culture [80]. This approach, however, required removal of the fusion protein in the latter purification process.

Another strategy is to modify the outer membrane structure to promote non-specific release of periplasmic proteins. Deletion of lpp, an outer membrane lipoprotein, significantly increased the outer membrane permeability, while showing limited growth defects [75]. It has been demonstrated recently that an lpp mutant can efficiently secrete antibody fragments at yields over 2.2 g/l in 10-l fermentors [72, 130]. The reduced guanosine tetraphosphate formation in relA and spoT mutants also affected membrane flexibility, and increased secretion of different recombinant proteins in these strains has been reported [96]. In addition to strain engineering, other parameters such as medium composition and cell growth rate also have impacts on protein secretion. For example, supplementation of 2% glycine and 1% Triton X-100 was suggested to change the morphology and integrity of the cell membrane, and this addition increased the excretion of the scFv-TNF-α fusion protein from 0.3 to 50 mg/l [125]. Different growth rates were also found to influence different degrees of periplasmic leakage due to changes in fatty acid composition in the phospholipids or decreases in the amount of outer membrane proteins [95]. In one study, the partitioning of a Fab fragment between the periplasm and supernatant was dramatically affected by the glycerol feeding rate, and the concentration of secreted Fab can reach over 1 g/l [5]. Over the past 2 decades, significant progress has been made in understanding extracellular protein secretion in E. coli and its applications in therapeutics production. While it represents an attractive opportunity for the industry, however, efficient extracellular secretion of recombinant proteins across E. coli outer membranes remains a challenge and often produces relatively low yields when compared to IB formation and periplasmic secretion.

Fermentation processes

To produce recombinant proteins in large quantities, fermentation technology is generally applied to increase cell density and protein productivity. Fermentation provides control over key chemical, physical and biological parameters that affect cell growth, as well as recombinant therapeutic protein production. These include, but are not limited to, temperature, dissolved oxygen (DO) level, pH and nutrient supply. A robust industrial fermentation process would also need to consider the composition and cost of the media, feeding strategies and scale-up process. If possible, the processes should also be designed to meet FDA guidelines, such as implementation of Process Analytical Technology (PAT) and Quality by Design (QbD), for process characterization and validation. In this section, important components for industrial fermentation are discussed.

Media

The media used to cultivate E. coli usually require several essential components, such as a carbon source, nitrogen source, essential salts, minerals and some growth factors, in order to reach an optimal cell density [93]. In general, three types of media, chemically defined (CD), semi-defined and complex medium, are used to support bacteria growth. CD medium is composed of chemicals of known identities and concentrations, while complex medium contains ingredients of natural origin, such as yeast extract and protein hydrolysate, in which the composition is not completely known. Semi-defined medium is constituted of mostly defined components with very few complex components. To have high cell growth as well as high recombinant protein production, the concentration of each component in these three different types of media must be carefully formulated to contain not only all necessary components, but also the optimal concentrations to avoid growth inhibition [57]. Currently, semi-defined and complex media are popularly used in industry because they offer flexibility and enable both high cell density and protein yields in most production processes.

A plethora of different types of therapeutics, including growth factors, antibody fragments and Fc-fusion proteins, have been produced using either semi-defined or complex media [14, 28, 126]. The use of protein hydrolysate and yeast extract in the semi-defined and complex media can also significantly reduce the cost of raw material when compared to CD medium. In some studies, these components also help cells to utilize acetic acid during carbon limitation and enhance recombinant protein production, particularly during high cell density fermentation [110]. However, fermentation using complex ingredients can show inconsistent performance that affects product yield and quality because of the lot-to-lot variation associated with poorly defined components. This issue is highly undesirable for protein therapeutics because processes are considered an integral part of product definition. In this case, using CD medium for commercial fermentation becomes a practical alternative. Despite CD medium being generally known for slow growth and/or low productivity, recent studies have shown that growth in CD medium can reach a similar growth profile and protein titer as its complex medium counterpart (unpublished data). Zhang and Greasham have also illustrated several advantages of using CD medium for commercial therapeutics manufacturing. These benefits include enhanced process consistency, improved process control and simplified protein purification, which will ultimately provide a high degree of assurance of process reproducibility and product quality [132].

Optimization of medium components for enhanced protein therapeutic production is also a common practice in industry. One simple way to accomplish this goal is to modify the published media recipes, when high cell growth and protein expression have been demonstrated. An alternative strategy, which is intensively used in industry, is to use statistical analysis, including design of experiments (DOE), to evaluate effects of different components and their interactions on important responses, such as cell growth, protein titer and quality. Since traditional medium optimization is labor-intensive, application of DOE can significantly reduce the number of experiments, efficiently evaluate the component effects and, in some cases, predict optimal medium composition [122]. Sometimes, identification of essential medium components can be carried out in an E. coli chemostat culture by addition of different components at varied concentrations. Based on the corresponding cellular responses, the importance of each component can be experimentally determined [128].

Batch fermentation

Batch fermentation is an easy way to culture cells to reach high cell density in a very short time. However, due to its low productivity compared to fed-batch culture, it is usually used to acquire a small amount of protein for the purposes of protein characterization or toxicology study. If a therapeutic protein is to be used in a low-dose therapy or is an orphan drug, a batch process is a good choice to ensure efficient and reliable production [126]. In most cases, however, it is not a commonly used process for industrial therapeutic manufacturing.

Fed-batch fermentation and feeding strategies

Unlike batch fermentation, a fed-batch protocol can achieve high cell density culture (HCDC) by continuously providing required nutrients to sustain and control cell growth to reach high protein productivity. After the batch phase, which is usually indicated by sharp DO and pH increases, highly concentrated nutrients are fed into the bioreactor. In most cases, feeds are split into two categories, the nitrogen source and the carbon source, which usually contains other necessary minerals and trace elements. Because E. coli generates several by-products, such as acetate, that have negative effects on cell growth and protein production, an optimal feeding procedure must be applied to control the concentration of these by-products and maintain proper cell growth and protein production [19]. Therefore, several different feeding strategies have been developed for HCDC, which include pH–stat control, DO-stat control, and feeding based on a specific growth rate and on the limitation of an essential substrate such as glucose.

In pH–stat control, a feed is activated when the pH of the medium increases, indicating nutrient depletion and cell death. Kim and coworkers have used pH–stat control to produce human glucagon-like peptide 1, resulting in a yield of 11.3 g/l [50]. It has also been used for the production of cancer vaccines, such as NY-ESO-1, Malan-A and SSX2, and showed high productivities [33, 64]. This strategy, however, reflects the starvation of the cells instead of cell growth. It tends to be a conservative feeding method and results in a growth rate below the threshold for acetate accumulation [19]. In DO-stat control, the feeding rate is controlled by the DO level in the medium. This strategy provides flexibility in manipulating the cell growth rate of HCDC by increasing aeration and oxygen supply to the bioreactor. Production of human antibody fragment (Fab′) in E. coli using DO-stat control has been reported to reach over 60 mg/l in a 300-l pilot scale bioreactor [84]. Chan and coworkers also used a similar feeding strategy to produce a bivalent antibody fragment in E. coli. Interestingly, in this study, by lowering DO level near to zero during the fermentation, the production titer can reach over 2 g/l [14].

Exponential feeding by controlling the cell specific growth rate (μ) can also provide desirable metabolic regulation and promote high protein production in HCDC. To keep μ at a pre-determined level, a feed-forward exponential feeding strategy is normally used, where the amount of nutrient required for cells to reach the desired μ is calculated in advance, and the feed is added accordingly during the process [54, 127]. Automatic control of the μ in the fed-batch fermentation has also been developed, where on-line data such as oxygen utilization rate and culture weight are used to calculate the feeding rate [58]. Ideally, high μ translates into high productivity and less process time. By controlling μ at a maximum attainable value of 0.55 h−1 at the beginning of the fed-batch phase and 0.4 h−1 during the induction phase, Babaeipour and colleagues showed that the productivity of interferon-γ was about eight-fold higher than μ controlled at a fixed value of 0.12 h−1 [4]. However, for a sustainable HCDC, μ should be maintained at less than the threshold growth rate where accumulation of glucose and acetate become inhibitory for cell growth and protein production. Direct feedback feeding is also possible by measuring the concentration of growth-limiting substrates such as glucose or the metabolic state of the cells to automatically adjust the feed rate [42, 51]. However, these techniques are rarely applied in industry.

Fermentation scale-up

For commercial protein therapeutics production, the fermentation usually starts in a laboratory-scale bioreactor (e.g., 5–30 l) to identify suitable growth and protein expression conditions. The process then transfers to pilot level (e.g., 200–600 l) to establish optimal operating parameters and finally to manufacturing scale (e.g., over 2,000 l) to reach high productivity. The scale-up process for any therapeutic protein should aim for high productivity with consistency in the protein quality and specific yield. However, as the scale increases, important biological, chemical and physical parameters affecting cell growth, as well as protein expression, will also change. This makes the scale-up process a challenging task. The common problems associated with scale-up originate from poor mixing, which increases circulation time and creates stagnant regions. This leads to imbalanced and zonal distribution of oxygen, nutrients, pH, heat and metabolites inside the bioreactor [20]. For example, in a normal large vessel, a vertical DO gradient (from bottom to top) is usually observed if the bulk mixing rate is slower than the mass transfer rate and a gradient can also occur with the feeding substrate, such as glucose (from top to bottom). In this case, cells at the top of the bioreactor are simultaneously exposed to high glucose concentration and oxygen limitation, resulting in metabolic overflow and mixed acid fermentation. Since mixing of the cells is still in progress, the same cells may shift to different locations of the bioreactor with different environmental stresses. Passing through the different stress zones induces metabolic shifts in cells, which will ultimately reduce cell growth and have negative impacts on both productivity and quality [89]. Therefore, several strategies have been used as principles to scale up E. coli fermentation to minimize the differences between scales by keeping one or more parameters constant from laboratory to manufacturing bioreactors. These parameters include power input per liquid volume (P/V), oxygen transfer rate (OTR), oxygen mass transfer coefficient (k L a), impeller tip speed, mixing time and impeller Reynold’s number (N Re) [47].

Traditionally, the constant P/V has been shown to be a successful scale-up criterion for industrial fungal and mammalian cell fermentation, but may be limited to recombinant E. coli culture because of its high energy requirement [89, 113]. Eslam and coworkers have suggested that scaling up based on k L a is the most appropriate approach for microorganisms, such as E. coli growing under aerobic conditions, and other studies have shown that it is indeed the most applied physical scale-up variable [36, 89]. The k L a is directly related to a bioreactor’s configuration and can be modulated by manipulating the bioreactor’s agitation speed, impeller design and air/O2 flow rate. In a large-scale and high-performance E. coli fermentation where power input and mixing are not an issue, a similar OTR and heat transfer rate will normally be used to ensure high productivity [126]. Scale-up based on a constant mixing time or tip speed usually has a higher success rate when the scale-up factor is small (4–40 l, scale-up factor 10) [13, 89].

In practice, however, the important scale-up parameters for each product may be different because of process differences and the operation limitations imposed by manufacturing facilities. Therefore, a detailed and comprehensive process characterization must be carried out in advance to identify critical parameters influencing protein yield and quality. Those parameters then should be kept constant during the scale-up processes. Several approaches have been utilized to characterize fermentation processes. These include real time on-line measurement of cellular responses, such as biomass and viability, and the concentrations of different substrates and metabolites. This information allows scientists and engineers to pinpoint complex interactions between cells and their environment, as well as their impacts on protein productivity and quality [89]. Another strategy is to create a laboratory-scale bioreactor that mimics the performance of its large scale counterpart, in terms of oxygen transfer, mixing and medium sterilization, and to use this scale-down model to evaluate important process parameters on a specific product [81]. Recently, computational fluid dynamics techniques have also been used to simulate and predict the mixing quality and the substrate gradient zones in a large-scale bioreactor [22]. The knowledge gained is then used to optimize the fermentation processes or to design a better bioreactor to improve the robustness of the process in a scale-up practice. Other challenges one might encounter in large-scale upstream manufacturing include availability of appropriately sized seed train bioreactors and feed tanks, medium sterilization and condensation, poor heat transfer capacity, variability caused by evaporation and holding stability of feed media solution [126].

Recent advancements in recombinant therapeutics production

Antibody fragments and full-length aglycosylated antibodies

Several recent technological advances have made E. coli a more appealing host to produce recombinant therapeutics (Table 3). These include the ability to produce different antibody fragments in high yields. It is estimated that at least 54 antibody fragments have entered clinical studies, with most of them being Fabs and scFvs [74]. Like full-length glycosylated antibodies, antibody fragments can also exhibit prominent clinical benefits for different oncological and immunological diseases. Although these fragments have disadvantages, such as short half-life in serum and inability to induce antibody-dependent cell-mediated cytotoxicity (ADCC) and complement-dependent cytotoxicity (CDC) when compared to full-length glycosylated antibodies, they show similar binding specificity and can penetrate tissues and tumors with higher efficiency. In addition, production of these small and aglycosylated fragments can be implemented in microbial systems rather than a costly mammalian cell culture [34, 121]. In fact, two recently approved Fab fragments, Lucentis® and Cimza®, are produced in E. coli (Table 1), indicating that E. coli has emerged as an ideal and robust host to express antibody fragments.

Table 3 Recent advancements in E. coli and their significance for industrial applications

Expression of functional antibody fragments is normally achieved by targeting both light chains and heavy chains or scFvs into the periplasm, where proper protein folding and disulfide bond formation can occur. This idea was first implemented by Skerra and Pluckthun, but a very low titer was achieved [32]. Since then, different approaches have been adopted to improve Fab fragment production in E. coli. These methods include optimization of the expression ratio of the heavy chain and light chain, co-expression of different folding chaperons and fermentation optimization [2]. Chen and coworkers have shown that an anti-CD18 F(ab′)2 can be produced at 2 g/l by increasing the stability of the light chain in the periplasm [14]. Recently, a yield of over 10 g/l has also been reported for different scFvs [59]. Cytoplasmic expression has also been explored as an alternative for high-level antibody fragment expression in E. coli. Although both the light chain and the heavy chain can be expressed separately as IBs, the refolding process was not efficient and not economically competitive [32]. Expression of antibody fragments in an E. coli strain with an oxidizing cytoplasm also showed some promising results; however, this approach has not been adopted by the industry yet [116]. Periplasmic secretion is still the major route used by the industry for production of functional antibody fragments.

In addition to antibody fragments, therapeutic applications of aglycosylated full-length antibodies have gained much attention in the past decade. Unlike glycosylated antibodies, which can activate the innate immune system for targeted cell death (ADCC/CDC), aglycosylated antibodies can be used in other clinical applications, such as antigen blocking, and receptor agonist and antagonist roles, where engagement of the immune system is not required and may even cause unwanted side effects [56]. In this case, E. coli is an ideal expression host for aglycosylated antibodies, as it possesses several bioprocessing advantages, as well as enables the production of fully aglycosylated antibodies. Simmons and coworkers reported the first efficient production method for an aglycosylated antibody, anti-tissue factor IgG1, in the E. coli periplasm. They obtained a yield of approximately 0.15 g/l using a bicistronic plasmid expressing heavy chains and light chains in a favorable ratio [97]. The produced antibody retained a similar half-life to its glycosylated counterpart and exhibited specific binding to the antigen and neonatal receptor (FcRn), but not to the FcγRI and C1q effectors. Coexpression of DsbA and DsbC further improved the yield to 1 g/l in an optimized HCDC culture [83]. Currently, several aglycosylated antibodies are being evaluated in clinical trials for different indications. One of them is MetMAb, a one-armed anti-cMet antibody containing a single Fab and aglycosylated Fc, which is the first therapeutic aglycosylated antibody manufactured in E. coli.

Recently, aglycosylated antibodies have also been engineered to have similar functionalities to glycosylated antibodies, especially the ability to bind different Fcγ receptors. Using the yeast surface display system, Sazinsky and coworkers identified an aglycosylated Fc variant (S298G/T299A) showing comparable binding affinities to FcγRIIa and FcγRIIb as glycosylated antibodies [88]. The binding affinity to FcγRI was also partially restored in this selected variant. A similar idea but different approach also led to the isolation of the E382V/M428I mutant in the CH3 region, which conferred binding affinity to FcγRI at a nearly identical level as that observed for glycosylated antibodies [46]. Remarkably, when Jung and coworkers introduced these double mutations into the anti-Her2 antibody trastuzumab, the mutated variant (trastuzumab-Fc5, which is produced in E. coli), elicited a significantly higher dendritic cell-mediated ADCC response than the glycosylated trastuzumab. Based on the differences observed in protein structures between aglycosylated and glycosylated antibodies, this group also hypothesized that higher conformational flexibility found in aglycosylated antibodies provides more degrees of freedom to identify variants with selective binding affinity towards different Fc receptors that may show novel therapeutic efficacies, as demonstrated in trastuzumab-Fc5. Therefore, recent breakthroughs in high-yield production of aglycosylated antibodies in E. coli and in engineering them for increased binding affinity with Fc receptors indicate that aglycosylated antibodies have become an important family of protein therapeutics produced in E. coli.

Strain engineering for improved product yield and quality

It is estimated that more than 3% of the enzymatic activities in E. coli are proteolytic at any given time [68]. Recombinant proteins produced in E. coli, both as IBs and soluble proteins, are susceptible to different levels of proteolysis, which is unfavorable for industrial production. Degradation creates protein fragments that can decrease product yield, affect protein quality and increase the production cost. One of the strategies used to improve recombinant protein stability has been deletion of crucial cytoplasmic or periplasmic proteases of the expressing strains. Lon and ClpP are two major ATP-dependent proteases located in the cytoplasm, and deletion of both genes with ClpYQ has been shown to improve the stability of recombinant human prourokinase by five-fold [49]. IB formation is known to protect recombinant protein from degradation. However, a recent study indicated that protein aggregates can also undergo direct proteolytic attack. In this report, the absence of Lon and ClpP also reduced IB disintegration up to 40% [117]. Even though the protein stability improved significantly in these studies, the ClpP-deficient strain exhibited a reduced growth rate and higher cell lysis in HCDC [85]. E. coli B and its derivatives are also deficient in Lon and OmpT, an outer membrane protease, and improved protein stability has also been observed in B strains compared with the K12 strains. In periplasm, DegP, Prc and protease III (ptr) are three major proteases. Chen and colleagues demonstrated that production of anti-CD18 F(ab′)2 in a protease double mutant (DegP and prc) could significantly decrease the generation of truncated light-chain species compared to the wild-type strain. When this double mutant was characterized in the HCDC, a dramatic increase in cell lysis was also observed, and most of the produced F(ab′)2 was located in the supernatant with low yield. Later, a third mutation in spr, a Prc suppressor, was included to compensate for the growth defects of Prc-deficient strain. This triple mutant increased the protein titer to 2.4 g/l, which is sevenfold higher than the double mutant [14]. Together, these studies indicate that protease inactivation has profound effects on recombinant protein stability, and therefore deletion of these proteases can significantly improve protein productivity.

Escherichia coli strains can also be designed specifically for each protein therapeutic to guarantee process robustness, high product yield and quality. One example was demonstrated by Caparon and coworkers who used proteomic techniques to identify several difficult-to-remove host cell proteins (HCPs), OppA, DppA and MalE, in the bioprocess producing ApoA-1M, a potential therapeutic for coronary heart disease. Genetic deletion of these three problematic HCPs and OmpT protease created a quadruple knockout strain, GB004, that exhibited similar growth characteristics to the parental strain. The new strain enabled conventional process optimization to be carried out, increased the ApoA-1M titer from 3.2 to 5 g/l and, most importantly, significantly decreased the HCP levels in the final purified protein [10].

Recently, with the advent of synthetic biology, E. coli strains with genome size reductions up to 14% were created by deleting of large unnecessary DNA segments of E. coli K12 genome. These strains were evaluated for different beneficial properties, including recombinant protein production (chloramphenicol acetyltransferase) during HCDC, and similar growth and protein yield were observed when compared to the parental strain [92]. By using this technology, a Clean Genome® E. coli strain was engineered as a novel biological factory providing enhanced genetic stability and high protein yield. Using a similar design concept, rational and systematic manipulation of the E. coli genome is expected to create a new method for developing more robust E. coli strains for recombinant therapeutic production. Therefore, strain engineering has continued to be a practical and effective method to enhance the production yield and quality of recombinant therapeutics.

E. coli N-linked glycosylation

The first prokaryotic N-linked protein glycosylation was uncovered in Campylobacter jejuni, which contains a gene cluster, pgl, involved in the biosynthesis of a number of different immunogenic glycoproteins [105]. By transferring the pgl pathway into E. coli, Wacker and colleagues successfully produced glycosylated proteins in these cells, opening the door for using E. coli to produce complex and glycosylated human therapeutics [120]. Compared to eukaryotes, this E. coli glycosylation system possesses similar but distinctive characteristics in its glycosylation machinery. For example, the oligosaccharyltransferase in C. jejuni, PglB, is the major enzyme to carry out glycan transfer; however, for eukaryotes, a multimeric protein complex is required. PglB also displays flexible substrate specificity, for both native and non-native proteins, and is promiscuous with regard to glycan structures that are transferred to proteins [23, 26]. The primary consensus sequence for N-glycosylation by PglB is extended to D/E-Y-N-X-S/T (Y, X ≠P), and the glycosylation efficiency increases when this domain is in the flexible and solvent-exposed region of a folded protein [55]. In addition, the pgl operon synthesizes a heptasaccharide glycan, which is completely different from eukaryotic N-glycans.

Since there are several intrinsic differences in N-glycosylation between this E. coli system and humans, overcoming these disparities is important for the development of the first humanized E. coli glycoprotein. One step forward in this direction was demonstrated by Lizak and coworkers, who introduced bacterial glycosylation sites into the linker region of an scFv, and the expressed scFv was found to be glycosylated with C. jejuni N-glycans. The glycosylated scFv showed improved protein stability and solubility [62]. A two-step method to produce eukaryotic N-glycans in E. coli was also developed. In this study, E. coli was glycoengineered by deleting the genes responsible for the synthesis and transfer of bacillosamine, an immunogenic non-human saccharide. Upon coexpressing WecA, this strain was able to transfer GlcNAc-1-phosphate to UND-P to produce the (GalNAc)5GlcNAc2 glycan and form the GlcNAc-Asn (human-like) linkage. After subsequent in vitro glycan trimming and enzymatic transglycosylation, a eukaryotic-like (Man3GlcNAc2) glycoprotein was produced [91]. In this study, the Man3GlcNAc2 structure was successfully introduced to the bacterial protein AcrA as well as two eukaryotic proteins, human IgG-Fc (CH2 domain) and scFv F8, a potential therapeutic antibody fragment. However, the in vivo glycosylation efficiency was low and varied (5–40%) among these tested acceptor proteins. This could be attributed to the undesirable metabolic state in E. coli that generates an insufficient amount of N-glycans for protein glycosylation. To improve the in vivo glycosylation efficiency of AcrA, Pandhal and coworkers further developed a comprehensive approach to forward engineer E. coli to enhance AcrA glycosylation efficiency by threefold through overexpression of isocitrate lyase [77].

Recently, expression of active human sialyltransferase was also achieved in E. coli, making possible the addition of sialic acid to E. coli glycoproteins [99]. Therefore, in the past few years, significant achievements toward the first humanized glycoprotein in E. coli have been demonstrated. However, a critical step to put the above-mentioned components together in one single E. coli is still missing. At this point, production of therapeutic glycoprotein in E. coli has shown promising results, and it is possible that E. coli may be used for industrial production of therapeutic glycoproteins in the near future.

Protein modification: fusion proteins and PEGylation

In general, small therapeutic proteins and peptides can be manufactured in high yield using microbial systems. However, these proteins tend to have short in vivo half-lives and fast renal clearance, resulting in the need for frequent dosing to maintain the desired drug level. To improve the pharmacokinetics of these proteins, efforts have focused on genetic engineering of the protein sequences and on increasing the size of the proteins by adding large molecules such as fusion proteins and polyethylene glycol (PEG). The latter strategy (increasing protein size) presents a great opportunity to expand E. coli-derived therapeutics, as illustrated by recently approved Cimza® and Nplate®.

The Fc antibody fragment has been used in several FDA-approved protein therapeutics including Enbrel® (TNFR2-Fc), Orencia® (CTLA4-Fc) and Nplate® (TPO-R binding peptide-Fc). Nplate is a hybrid protein consisting of a small peptide moiety and Fc fragment. It is the only commercial Fc fusion therapeutic produced in E. coli that is expressed as IBs and refolded to regain its biological function. The therapeutic design and production platform of Fc fusion proteins are promising, especially when the engagement of glycosylated Fc with Fc receptors seems unnecessary in some Fc fusion therapeutics [41]. Another clinically used fusion construct is human serum albumin (HSA), a widely distributed inert serum protein with a long half-life. Several HSA-fused protein therapeutics, such as albINF-γ, alb-insulin and alb-GH, exhibit prolonged serum half-life with unaltered biological activities [101]. Most of the HSA fusion therapeutics are produced as recombinant secreted proteins in S. cerevisiae and Pichia pastoris. E. coli is generally considered an unsuitable host to express HSA because of the protein’s complex molecular structure (17 disulfide linkages) [35].

PEGylation is the most successful and commonly used chemical modification approach to improve the pharmacokinetics of protein therapeutics. Although PEGylation also increases the production cost and the heterogeneity of the final product, this technology has become increasingly popular in recent years. Several PEG-modified proteins, including PEGylated G-CSF, human growth hormone, interferon-α and anti-TNFα Fab fragments, are available on the market, and most are produced in E. coli [104]. Therefore, newly developed fusion proteins and protein modification technologies have offset some of the drawbacks of E. coli-derived therapeutics and expanded the use of this host for more diversified therapeutic applications.

Cell-free systems for protein therapeutic synthesis

Cell-free systems are attractive alternatives to producing recombinant therapeutics in vivo. Compared to traditional protein production in living cells, cell-free systems offer several advantages for the development and production of therapeutics. Due to the simplicity and easy manipulation of this system, in vitro protein synthesis and its functional analysis can be carried out in few hours, making it a powerful tool for high-throughput protein screening and genome-wide protein production [73]. Because there is no need to maintain cell growth, the whole system can be optimized for the production of a single protein. The optimization includes transcription and translation modulation and incorporation of non-natural amino acids into the protein. In addition, reduced amounts of endotoxin and cell debris make subsequent protein purification processes easy to implement. Two different types of cell-free systems are now commercially available. One is the PURE system, which is composed of individually purified components of the E. coli translational machinery. The other is a cell-extract based system, where different cell extracts, including E. coli, wheat germ, rabbit reticulocyte and insect cells, are used to supply the protein translational apparatus [76].

In the past decade, cell-free systems using E. coli extract have made several breakthroughs, creating a reliable platform for industrial therapeutic production. In 2004, Jewett and Swartz engineered the “Cytomin” system that activates bacterial oxidative phosphorylation pathway and uses pyruvate as the energy source to allow the energy generation inside the system to be cost-effective [40]. This system also prolonged the protein synthesis reaction by reducing by-product accumulation and stabilizing the pH of the reaction. The quality of E. coli extracts also has been improved in several ways, including optimization of extract preparation methods and engineering of an E. coli KC6 strain that maintains stable pools of amino acids during the protein synthesis [9, 61]. The KC6 strain was then further engineered in favor of synthesis of disulfide bonded proteins by creating an optimal thiol redox potential [52]. At this point, the system has demonstrated a variety of applications, including production of protein therapeutics such as murine granulocyte macrophage colony stimulating factor (GM-CSF), scFvs and IGF-I. For commercial therapeutic production, however, scaling-up a cell-free system was a more significant challenge [102]. Recently, Zawada and coworkers showed that efficient, predictable and scalable protein production can be achieved in this system by optimization of important parameters, including extract incubation conditions, DNA sequence and redox environment for disulfide bond formation. The process used in this study produced recombinant GM-CSF at a concentration of 700 mg/l within 10 h in a 100-l standard bioreactor [131]. This result is a big breakthrough for the cell-free system since it confirms linear scalability from 200 μl to 100 l with similar titers. It also demonstrates that this technology is approaching its utility for industrial production.

Conclusion

Almost 30 years after the first recombinant insulin was approved by the FDA, E. coli is still widely used by the industry for recombinant therapeutics production. Versatile genetic tools are now available for tightly regulated and high-level protein expression in E. coli, where protein induction remains simple and cost-effective. While production of recombinant proteins in IBs and their refolding have become common practices in industry, periplasmic secretion provides advantages, such as proper folding and yield of proteins with authentic N-terminus. Two recently approved antibody fragments were produced in the periplasm, indicating that this technology is now commercially competitive. In addition, improved extracellular protein secretion now can be achieved in leaky strains. These developments allow the industry to leverage the benefits in each E. coli expression method and provide flexibility in protein expression. High cell density fermentation in E. coli can further enhance the overall productivity of recombinant therapeutics. Various tools, including different feeding strategies and scale-up protocols, have been established to ensure high-yield protein production at a manufacturing scale. However, challenges remain for the continued use of E. coli for protein production, because of its assumed inability to express some complex and glycosylated proteins. Recently, several advancements in E. coli protein production have challenged these assumptions and made E. coli an even more robust expression host. A full-length aglycosylated antibody can now be expressed in E. coli with high yield (~1 g/l), and production of different Fab fragments was also achieved in high titers in protease-deficient E. coli mutants. These studies demonstrate that E. coli can express complex proteins as efficiently as mammalian cells. The N-glycosylation pathway has been engineered in E. coli, and the progress towards the first humanized glycoprotein produced from E. coli looks promising. Moreover, addition of fusion partners to improve the pharmacokinetics of E. coli-produced therapeutic proteins, as well as the use of cell-free systems to generate therapeutics, further expands the capability of E. coli to express different protein therapeutics. With all these advancements and advancements yet to come, E. coli will definitely remain a workhorse for the recombinant therapeutic production in the industry.