Abstract
Escherichia coli is a valuable commercial host for the production of heterologous proteins. We used elementary mode analysis to identify all possible genetically independent pathways for the production of three specific recombinant proteins, green fluorescent protein, savinase and an artificial protein consisting of repeating units of a five-amino-acid cassette. Analysis of these pathways led to the identification of the most efficient pathways for the production of each of these proteins. The results indicate that the amino acid composition of expressed proteins has a profound effect on the number and identity of possible pathways for the production of these proteins. We show that several groups of elementary modes produce the same ratio of biomass and recombinant protein. The pattern of occurrence of these modes is dependent on the amino acid composition of the specific foreign protein produced. These pathways are formed as systemic combinations of other pathways that produce biomass or foreign protein alone after the elimination of fluxes in specific internal reversible reactions or the reversible carbon dioxide exchange reaction. Since these modes represent pathway options that enable the cell to produce biomass and protein without utilizing these reactions, removal of these reactions would constrain the cells to utilize these modes for producing biomass and foreign protein at constant ratios.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Escherichia coli is used extensively for producing a wide variety of heterologous proteins (Anderson and Krummen 2002). It is one of the best characterized prokaryotic organisms and years of research have produced detailed knowledge of its genetics, molecular biology and biochemistry (Ingraham et al. 1983). Extensive investigations have also resulted in the development of sophisticated techniques for achieving high-level expression of foreign proteins (Khosla et al. 1990; Swartz 2001) and very high cell densities in bioreactors (Nakagawa et al. 1995; Lee 1996).
In addition to these experimental studies, E. coli is a standard for the development of theoretical tools for studying metabolism (Stephanopoulos et al. 1998). One recent method for analyzing the metabolic capabilities of biochemical networks is the rigorous pathway analysis technique known as elementary mode analysis (Schuster et al. 1999, 2000). This method identifies the complete set of genetically independent pathways for a biochemical reaction network and has been used, for instance, to find the most efficient E. coli pathways for ATP and biomass production (Carlson and Srienc 2004a, b). These studies indicated that cells likely use sets of defined pathways to grow efficiently under varying levels of culture-stress. Since E. coli is extensively used for foreign protein production, it is of significant practical interest to identify the most efficient pathway options for the synthesis of these macromolecules. Metabolic networks for the production of amino acids were recently analyzed using linear optimization-based approaches (See et al. 1996; Pharkya et al. 2003). However, little work has been dedicated so far towards the analysis of metabolic flux distributions during the synthesis of heterologous proteins.
This study examines the production, in E. coli, of three example proteins, poly(glycine-valine-glycine-isoleucine-proline) (GVGIP), green fluorescent protein (Gfp) and savinase. GVGIP is a protein from the class of elastomeric polypeptides poly(GVG-X-P), where X stands for any amino acid. These proteins are of commercial interest since they are biocompatible, biodegradable polymers that have a range of other properties which make them suitable for drug-delivery vehicles, surgical scaffolds and other applications (Urry 1988, 1999). Gfp is widely utilized as a reporter protein to test expression systems and to develop protein production strategies because of its convenient auto-fluorescence (Chalfie et al. 1994; Natarajan et al. 1998; Subramanian and Srienc 1996). Finally, savinase is a protease used in detergents (Gupta et al. 2002).
We describe here an analytical technique for identifying and analyzing the metabolic flux distributions for optimum heterologous protein production. Different amino acid compositions of the foreign protein result in different optimal flux patterns, suggesting the potential for tailoring recombinant hosts to efficiently produce specific recombinant proteins. Furthermore, we identify subclasses of elementary flux modes that are formed as a systemic combination of other modes. We show that these modes could be useful in certain situations for identifying genetic modifications for efficient protein production by forcing the metabolite fluxes toward these pathways. Moreover, the presented methodology is a further example for the rational analysis of metabolic networks which should be generally useful for the optimization of production of any foreign protein or metabolite in a recombinant host, if its metabolic pathway structure is known. Information about pathway structure can be obtained, in principle, from genome sequences, which are available for an increasing number of organisms.
Materials and methods
Pathway analysis
The publicly available program METATOOL (ver. 352_double; http://mudshark.brookes.ac.uk/sware.html) was used for the elementary mode analysis. The output from the METATOOL program was analyzed using a Microsoft Excel spreadsheet. The modes were sorted based on various criteria, such as biomass yield and protein yields on glucose. The yield of a certain mode was defined as the ratio of carbon atoms in the amount of product produced to the number of carbon atoms in the glucose consumed. A separate simulation was used for each protein analyzed.
The simulations used the network described by Carlson and Srienc (2004a). Glucose and carbon dioxide were considered as the sole carbon sources. The metabolic requirements for biomass and specific recombinant protein synthesis were modeled using the theory developed by Ingraham et al. (1983) by calculating the corresponding metabolic drain from the central metabolic pathway. The results presented in this study involve simulations utilizing a biomass macromolecular composition corresponding to a 200-min doubling time. The results with other growth rates were qualitatively similar to the results presented in this work. Appropriate reaction equations were constructed for each considered protein by taking into account the metabolite drain from the central metabolism for each required amino acid and the energy requirements for the production of each peptide bond. Information about metabolite requirements for the production of each amino acid was obtained from the literature (Ingraham et al. 1983). The production of a protein with n residues requires the hydrolysis of approximately 4n high-energy phosphate bonds to form the polypeptide bonds (Mathews et al. 2000). Accordingly, the model required four ATP equivalents to form each peptide bond. Table 1 illustrates the method used for calculating the metabolite drain from intermediate metabolism for the synthesis of Gfp, taking into account its amino acid composition. It also lists the final metabolic requirements for the production of savinase and GVGIP. It is to be noted that the equation for biomass production does not account for the production of the specific recombinant protein in consideration. Hence, the term “biomass” as used in this work does not include the recombinant protein. Details about the construction of the biomass and the protein terms and assumptions involved are provided by Carlson and Srienc (2004a). The systemic dependence between modes was analyzed using MATLAB code developed based on the algorithm detailed in the next section.
Algorithm
This algorithm was used for identifying complementary biomass and protein-producing modes that form constraint line modes.
-
(1)
Choose a specific constraint line. Choose a mode on this constraint line: M CL.
-
(2)
Select the set of all modes with a yield of protein on glucose greater than that of M CL. From this set, eliminate all modes that have a positive flux in any irreversible reaction that has a zero flux in M CL. The rest of the modes comprise set I.
-
(3)
Select the set of all modes with a yield of protein on glucose less than that of M CL. From this set, eliminate all modes that have a positive flux in any irreversible reaction that has a zero flux in M CL. These modes comprise set II.
-
(4)
Select a reversible reaction (R elim) in M CL with a zero flux.
-
(5)
Choose a single mode from set I: M I.
-
(6)
Choose a single mode from set II: M II.
-
(7)
Compute a new mode, M Sys, by forming a systemic combination of M I and M II to eliminate flux through R elim.
-
(8)
Check that every reaction with a zero flux in M CL also contains a zero flux in M Sys. If not, go to step 5.
-
(9)
For each reaction, calculate the ratio of the raw flux through M CL and M Sys. Check that this ratio is the same for each of the reactions. If yes, then M CL is a systemic combination of M I and M II. If not, go to step 5.
Results
Most efficient elementary modes
Since elementary mode analysis identifies every possible metabolic pathway for the production of heterologous proteins and/or biomass, it is possible to explicitly list and compare all pathways according to certain criteria. We compared the carbon yield of recombinant protein and biomass on glucose as a measure of the efficiency of different pathways. Elementary mode analysis revealed 370 GVGIP-producing modes for the considered E. coli network. In contrast, there are 916 modes which produce Gfp and 826 modes that produce savinase. Thus, the number of possible modes appears to be strongly dependent on the amino acid composition of the expressed protein.
The yield of biomass on glucose versus the yield of protein on glucose was plotted for each of the modes for the different proteins analyzed (Fig. 1). The modes on the abscissa represent modes that make only biomass without any recombinant protein. The modes on the ordinate represent pathways that make the foreign protein without any biomass. Experimentally, it has been shown that it is possible to produce recombinant protein without producing biomass (Flickinger and Rouse 1993). The other modes co-produce biomass and recombinant protein. Examination of these plots confirms the expected result that the most efficient protein-producing modes do not make biomass while the most efficient biomass-producing mode does not produce any recombinant protein.
Certain common features can be observed for the three proteins analyzed. The line connecting the most efficient protein-producing and biomass-producing modes (optimum line) represents flux possibilities that are based on a linear combination of the two most efficient modes. In all three cases, every elementary mode is located either under or on the optimum line. Therefore, a linear combination of these two most efficient modes results in the optimum usage of glucose for the efficient and simultaneous production of the considered foreign protein and biomass.
The pathways resulting in most efficient protein production for GVGIP, Gfp and savinase are shown in Fig. 2a, b and c, respectively. For the sake of comparison, the most efficient biomass-producing mode as described by Carlson and Srienc (2004a) is shown in Fig. 2d. The optimum modes for the three proteins analyzed use the complete tricarboxylic acid (TCA) cycle and do not produce any carbon-containing byproducts, apart from carbon dioxide. There are significant differences among the pathways, depending on the specific protein being analyzed. The most efficient GVGIP-producing mode does not utilize the pentose phosphate pathway (PPP), since the protein does not contain any amino acids originating from precursors of the PPP (see Table 2). In contrast, Gfp and savinase require reactions from the PPP to produce precursors for the appropriate amino acids. While the flux through each of the reactions is quantitatively different (Fig. 3), the directions of the flux in certain reactions in the protein modes change compared with the biomass mode. A reversal in the direction of these reactions leads to specific cases where the net fluxes are cancelled, as shown in Fig. 4 and 5.
For Gfp, the direction of flux through the transketolase-catalyzed (Fig. 2, R13r) and the transaldolase-catalyzed (R14r) sugar rearrangement reactions of the PPP and the reactions between PEP and pyruvate (R9, RR9) are reversed in comparison with the corresponding reactions for biomass synthesis. In the case of savinase, the flux through reactions R13r and R14r are reversed but not through R9, RR9, unlike the case for Gfp.
Co-production of biomass and recombinant protein—constraint lines
Many modes co-producing biomass and recombinant protein align along straight lines that we call constraint lines (Fig. 1). The modes that occur on these lines produce foreign protein and biomass in the same ratio. Since constraint lines reflect inherent limitations imposed on the cell by the stoichiometry of the reaction network and the amino acid composition of the expressed proteins, the number and pattern of occurrence of these modes varies depending on the specific protein being produced.
All modes on the same constraint line are characterized by a conservation of the topology of reactions of the PPP. For Gfp, the modes on constraint line 1 do not include the sugar rearrangement reactions of the PPP (R13r, R14r). Flux into the PPP for all modes in constraint line 2 is exclusively through the oxidative branch of the PPP (R10) and these modes do not utilize one of the transketolase-catalyzed reactions (R15r). The most efficient modes on these constraint lines (representing the end-points of the constraint lines) are shown in Fig. 4. Savinase features two constraint lines, I and II. Similar to constraint line 1 for Gfp, the modes on constraint line I for savinase do not contain reactions R13r and R14r. Finally, GVGIP constraint lines ii, iii and iv have no flux through reactions R13r/R14r, R11r and R15r, respectively, while constraint line i has a non-zero flux in all of its PPP reactions (Table 3).
Some of these lines extend to the optimum line, while others do not (see Fig. 1). For instance, the plot for GVGIP shows many constraint lines that do not extend to the optimum line, while the plots for Gfp and savinase contain one constraint line each that extends to the optimum line. Certain properties make them inefficient and prevent specific constraint lines from intersecting with the optimum line. For instance, Gfp constraint line 1 intersects with the optimum line while constraint line 2 does not. Further examination shows that all modes on constraint line 2 utilize the oxidative PPP (R10). This reaction produces two moles of NADPH and one mole of carbon dioxide for every molar flux of glucose-6-phosphate into the PPP. Apart from the loss of carbon as carbon dioxide, partially oxidized by-products are also produced to maintain the redox balance. These factors lead to reduced yields for modes on constraint line 2 and therefore these modes are less efficient.
Constraint line modes are systemic combinations of other modes
There is a common mechanism for the occurrence of a vast majority of these constraint lines: constraint line modes are systemic combinations of modes producing biomass and protein alone, i.e., they can be expressed as a non-trivial, non-negative linear combination of modes producing only biomass or only recombinant protein. The modes occurring on constraint lines result from the cancellation of certain reversible reactions when these specific biomass- and recombinant protein-producing modes are systemically combined. Examination of the modes shows that such cancellation can occur in six different reversible reactions of the network, namely the transketolase-catalyzed reactions, R13r, R15r, the transaldolase-catalyzed reaction, R14r, the phosphopentose epimerase-catalyzed reaction, R11r, the isomerase-catalyzed reaction, R2r and the carbon dioxide exchange reaction, R97r.
The majority of the modes on constraint line 1 for Gfp result from the elimination of reactions R13r and R14r from systemic combinations of modes producing only biomass or recombinant protein. It was pointed out earlier that the optimum Gfp modes differ from the optimum biomass mode in the direction of the flux in the sugar rearrangement reactions of the PPP (R13r, R14r; see Fig. 3). When the cell is just making biomass without any recombinant protein, the flux is directed towards ribose phosphate in reaction R13r. Alternately, when the cell is making protein only, the flux in this reaction is directed towards glyceraldehyde phosphate. Similarly, the direction of the flux through R14r is also reversed as more and more protein is produced. The modes that occur on constraint line 1 correspond to those flux distributions that occur when the opposing fluxes in R13r and R14r due to biomass and Gfp production cancel each other, resulting in zero net flux. These modes lead to the production of biomass and Gfp at a constant ratio without utilizing reactions R13r and R14r.
Similar reasons lead to the creation of constraint line 2 for Gfp. The modes on this constraint line utilize the oxidative branch of the PPP and the sugar rearrangement reactions R13r and R14r. However, these modes do not utilize the transketolase-catalyzed reaction R15r. Thus, these modes utilize the oxidative PPP reaction R10 as the sole entry-point into the PPP. It can be shown that this constraint line arises due to a combination of two sub-optimal biomass- and Gfp-producing modes (Fig. 5). For instance, the most efficient mode on this constraint line occurs at the point where the opposing fluxes in reaction R15r due to suboptimal biomass and Gfp production cancel each other, resulting in a net flux of zero.
Similarly, it can be shown that the four modes comprising constraint line 4 for Gfp arise due to cancellation of the carbon dioxide exchange reaction, R97r. These modes are formed due to elimination of flux in reaction R97r when a biomass-producing mode consuming carbon dioxide is systemically combined with a protein-producing mode producing carbon dioxide.
Although most of the constraint line modes are systemic combinations of modes producing biomass or recombinant proteins alone, some constraint line modes are formed due to systemic combinations involving other constraint line modes. For example, eight modes on Gfp constraint line 1 are formed when suboptimal biomass modes are systemically combined with modes on Gfp constraint line 4 to eliminate reactions R13r and R14r. Nine modes on constraint line 1 and the two modes forming constraint line 7 are formed by a similar systemic combination of suboptimal Gfp-producing modes and modes on constraint line 8.
Similar reaction cancellations after systemic combinations of two modes are also responsible for the constraint line modes occurring for the production of savinase and GVGIP. Constraint line I for savinase is formed due to the elimination of reactions R13r and R14r when biomass- and savinase-producing modes are systemically combined. Constraint lines i, ii, iii and iv for GVGIP are formed by systemic combinations of biomass, GVGIP and other constraint line modes to eliminate reaction R97r. Elimination of reactions R13r, R14r, R15r and R11r are also involved in the formation of several constraint lines for GVGIP that are based on only a few modes.
Using the algorithm described in the Materials and methods, we tested whether each of the constraint line modes is a systemic combination of other modes. Table 4 provides an example of complementary biomass- and protein-producing modes that form a constraint line mode. However, a small number of constraint line modes were found to be systemically independent of other modes. All modes on Gfp constraint lines 3, 6 and 8 and GVGIP constraint lines viii, x and xix were found to be systemically independent of other biomass, protein or constraint line modes (Table 3). However, the number of such systemically independent modes is very small. While all constraint line modes for savinase are systemically dependent on other modes, four of the 89 modes on GVGIP constraint lines and nine of the 194 Gfp constraint line modes are systemically independent.
Gene-knockout targets for efficient protein production
The identification of all possible non-decomposable pathways using elementary mode analysis gives insight into the metabolic capabilities of a system (Schuster et al. 2000). It provides a means of identifying which enzymatic reactions are required for efficient protein production and which reactions are not. Accordingly, we identified specific mutations that eliminate less efficient pathways for protein production for GVGIP and savinase.
For GVGIP, the identified reactions to be eliminated involve the oxidative branch of the PPP (R10), the enzymatic activity of the malic enzymes (R41), the enzymatic activity associated with the action of NADH dehydrogenase II (R83), the enzymes involved in the production of lactate and succinate (R94, R95) and the transketolase enzyme (R15r). Such a strain would produce GVGIP most efficiently but would be unable to produce biomass. Since this strain would be unable to grow, an inducible genetic switch would have to be used to implement this strategy.
In the case of savinase, the removal of the transaldolase enzyme (R14r) along with reactions R10, R41, R83, R94 and R95 leads to the elimination of all modes except a few modes on constraint line 1. These remaining modes force the cell to make savinase at a ratio of 2.8 moles of carbon in savinase per mole of carbon in biomass. This analysis indicates that this strain would still be capable of growth and hence this strain could be used to produce savinase and biomass at a constant ratio in a continuous reactor.
It can be further shown that the proposed mutations have the effect of optimizing the production of savinase at all levels of oxygen availability. Figure 6 is an inverse yield plot (Carlson and Srienc 2004a) that illustrates the available modes on constraint line I, before and after the proposed mutations. In such a plot, modes that lie closest to the origin are the most efficient in the conversion of the carbon source and oxygen into the desired product. It can be seen that, for savinase, the available modes after the proposed mutations are on or very close to the optimum transition line from aerobic to anaerobic conditions. It can also be similarly shown that the proposed mutations in these cases also optimize the production of maintenance energy at all levels of oxygen availability. It should also be noted that, since all optimal modes utilized under oxygen limitation require the production of acetate, a single mutation eliminating the acetate-producing reaction would eliminate all modes except the most efficient mode. However, the optimal performance of such a strain would be maintained only under aerobic conditions.
Discussion
We used elementary mode analysis to study the co-production of biomass and recombinant proteins in E. coli. We identified special sets of elementary flux modes that co-produce biomass and recombinant protein at constant ratios. Understanding these pathway possibilities is important because it provides unique insight into the nature of the metabolic network, which may suggest specific strain-optimization strategies. These constraint lines are created by the cancellation of opposing fluxes in specific reversible reactions during the co-production of biomass and recombinant protein. Hence, these modes are systemically dependent on other modes. The property of systemic independence has been used in other metabolic network analysis tools, such as extreme pathway analysis (Schilling et al. 2000). We observed in our simulations that many constraint line modes are observed to be formed due to the elimination of the carbon dioxide exchange reaction when other modes are systemically combined. These constraint line modes in turn lead to other constraint line modes when they are combined systemically to eliminate other internal reversible reactions. It is important to note that tools such as extreme pathway analysis do not identify these modes, even though these modes are physiologically relevant (Klamt and Stelling 2003).
We further showed that some of these constraint line modes could be used to optimize the production of recombinant proteins. Since all modes on a constraint line produce foreign protein and biomass at a constant ratio, directing cellular fluxes through these modes could force the cell to produce protein and biomass at these ratios. Since constraint lines are formed by the elimination of fluxes through certain reversible reactions due to systemic combinations of biomass- and protein-producing modes, removal of these reversible reactions through genetic modifications could force the cell to utilize specific constraint line modes. Constraint line I for savinase is formed by the elimination of flux through reaction R14r. Hence, elimination of this reaction forces the cell to utilize modes on this constraint line and hence produce savinase and biomass at a constant molar ratio of 2.8 moles of carbon in savinase per mole of carbon in biomass. It was shown previously that the elimination of reactions R10, R41, R83, R94 and R95 has the effect of producing biomass for wild-type E. coli at the most efficient carbon yield (Carlson and Srienc 2004a). Elimination of these five reactions in addition to reaction R14r ensures that only the most efficient modes on constraint line I are available to the cells at all levels of oxygen stress.
References
Anderson DC, Krummen L (2002) Recombinant protein expression for therapeutic applications. Curr Opin Biotechnol 13:117–123
Carlson R, Srienc F (2004a) Fundamental Escherichia coli biochemical pathways for biomass and energy production: identification of reactions. Biotechnol Bioeng 85:1–19
Carlson R, Srienc F (2004b) Fundamental Escherichia coli biochemical pathways for biomass and energy production: creation of overall flux states. Biotechnol Bioeng 86:149–162
Chalfie M, Tu Y, Euskirchen G, Ward WW, Prasher DC (1994) Green fluorescent protein as a marker for gene expression. Science 263:802–805
Flickinger MC, Rouse MP (1993) Sustaining protein-synthesis in the absence of rapid cell-division—an investigation of plasmid-encoded protein expression in Escherichia coli during very slow growth. Biotechnol Prog 9:555–572
Gupta R, Beg QK, Lorenz P (2002) Bacterial alkaline proteases: molecular approaches and industrial applications. Appl Microbiol Biotechnol 59:15–32
Ingraham JL, Maaloe O, Neidhardt FC (1983) Growth of the bacterial cell. Sinauer Associates, Sunderland, Mass.
Khosla C, Curtis JE, DeModena J, Rinas U, Bailey JE (1990) Expression of intracellular hemoglobin improves protein synthesis in oxygen limited Escherichia coli. Biotechnology 8:849–853
Klamt S, Stelling J (2003) Two approaches for metabolic pathway analysis. Trends Biotechnol 21:64–69
Lee SY (1996) High cell density culture of Escherichia coli. Trends Biotechnol 14:98–105
Mathews CK, Holde KE van, Ahern KG (2000) Biochemistry. Addison Wesley Longman, San Francisco
Nakagawa S, Oda H, Anazawa H (1995) High cell density cultivation and high recombinant protein production of Escherichia coli strain expressing uricase. Biosci Biotechnol Biochem 59:2263–2267
Natarajan A, Subramanian S, Srienc F (1998) Comparison of mutant forms of the green fluorescent protein as expression markers in Chinese hamster ovary (CHO) and Saccharomyces cerevisiae cells. J Biotechnol 62:29–45
Pharkya P, Burgard AP, Maranas CD (2003) Exploring overproduction of amino acids using bilevel optimization framework OptKnock. Biotechnol Bioeng 84:887–899
Schilling CH, Edwards JS, Lestscher D, Palsson BO (2000) Combining pathway analysis with flux balance analysis for comprehensive study of metabolic systems. Biotechnol Bioeng 71:286–306
Schuster S, Dandekar T, Fell D (1999) Detection of elementary flux modes in biochemical networks: a systemic organization and analysis of complex metabolic networks. Trends Biotechnol 17:53–60
Schuster S, Fell D, Dandekar T (2000) A general definition of metabolic pathways useful for systemic organization and analysis of complex metabolic networks. Nat Biotechnol 18:326–332
See SM, Dean JP, Dervakos G (1996) On the topological features of optimal metabolic pathway regimes. Appl Biochem Biotechnol 60:251–301
Stephanopoulos G, Aristidou A, Nielsen J (1998) Metabolic engineering: principles and methodologies. Academic, San Diego
Subramanian S, Srienc F (1996) Quantitative analysis of transient gene expression in mammalian cells using the green fluorescent protein. J Biotechnol 49:137–151
Swartz JR (2001) Advances in Escherichia coli production of therapeutic proteins. Curr Opin Biotechnol 12:195–201
Urry DW (1988) Entropic elastic processes in protein mechanisms. I. Elastic structure due to inverse temperature transition and elasticity due to internal chain dynamics. J Protein Chem 7:1–34
Urry DW (1999) Elastic molecular machines in metabolism and soft-tissue restoration. Trends Biotechnol 17:249–257
Acknowledgement
This work was funded by the National Science Foundation (BES-0109383).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vijayasankaran, N., Carlson, R. & Srienc, F. Metabolic pathway structures for recombinant protein synthesis in Escherichia coli . Appl Microbiol Biotechnol 68, 737–746 (2005). https://doi.org/10.1007/s00253-005-1920-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00253-005-1920-7