Introduction

In the common sense of living organisms, there are sets of chemical reactions, which are usually catalyzed by various enzymes. In most cases, these reactions are sequentially organized into metabolic pathways. A metabolic pathway responsible for the degradation of a nutrient compound or the biosynthesis of a given biomolecule, is a functionally and structurally complete cellular unit. Instead of being isolated, they interact each other. The collection of the interweaving pathways in a cell makes a metabolic network. The maintenance of life requires a finely concerted balance among these pathways in an integrated metabolic network.

A metabolic network is not static, but dynamic and changing. At a long time scale, the metabolic network evolves to be more and more optimized with the natural selection as the permanent pressure; in a short time range, the alteration in metabolic network can enable a cell to efficiently and promptly response to the environmental change(s). For any living organism itself, an optimized and ever-changing metabolic network can avoid any unnecessary biosynthesis of biomacromolecules from limited resources or enzymes metabolizing currently unavailable nutrients. As a result, under the natural conditions, in a bacterial cell of a wild-type strain, any cellular substance is synthesized at a reasonably appropriate amount that can support survival of the cell.

Exploring microbial cells as the workhorse for the human being, we must modify the wild-type metabolic pathway of a cell: maximize the flux towards our desired products, and minimize the flux in other metabolic pathways that might lead to loss of precursor metabolites or energy (Fig. 1). These operations are based on the knowledge on the pathway of interest. Advances in biochemistry, molecular biology, and molecular genetics have provided us a broader horizon of metabolic pathways, and their complicated regulation, offering us not only more choices in tool and methodology in metabolic engineering researches, but also the components of a given metabolic pathway that can be used as candidate targets to be engineered. There are many good reviews on discussing the tools and methodology in metabolic engineering [13]. In this review, we will provide a relatively comprehensive introduction of the components in a pathway and the main strategies, with which these components can be modified from the viewpoint of metabolic engineering.

Fig. 1
figure 1

The principle of metabolic engineering is to maximize the flux towards the formation of the end product(s), and minimize the flux towards by product(s). Metabolizing glucose, the most common and important material that supports cell growth and provides the building blocks for end products, into different intermediate metabolites (IM1, IM2,…IMn) through variable numbers of steps predominantly catalyzed by individual enzyme (E1, E2,…En) in a cell, is the domain of operations in metabolic engineering. Enhancing the efficiency of end product formation by engineering the steps (including the addition of certain precursor) are the key activities in metabolic engineering

The Targets for Metabolic Engineering

Metabolic pathway is a sequence of biochemical reactions. Enzymes encoded by different genes are the catalysts of these reactions. Therefore, protein-encoding genes are reasonably the key players of a metabolic pathway. In a broad sense, a protein-encoding gene includes the encoding sequence (structural gene) and all regulatory elements. The encoding sequence of a gene primarily determines the biochemical and biophysical properties of the polypeptide. Regulatory elements discussed here refer to other sequences defined by the gene except for the encoding region, including promoter, ribosome binding site sequence, terminator, etc. (Fig. 2).

Fig. 2
figure 2

Elements in a typical protein-encoding gene are the possible targets for metabolic engineering. For a typical protein-encoding gene, it encompasses a promoter (A), an operator region for expression control (C), a ribosome binding site (D) to attract ribosome and to initiate the translation, an encoding region (E) which will be transcribed into the template for translation, and a terminator (F) dictating the stop of transcription of a region. Different genes are arranged to be relatively independent by intergenic regions (G). Possible existence of riboswitch (B) in mRNA functions as a sensor to probe the abundance of the product of a gene or a pathway, directly regulating the translation of the mRNA. When the systematic regulatory network is insufficiently understood, enhancing the efficiency of steps in a pathway by modifying the elements defined by a gene is the most straightforward and realistic practices

The Catalyst Itself: Genetic Modification of the Encoding Sequences

In general, a gene functions when it is transcribed into RNA, and the protein-encoding part is transcribed into mRNA. Despite the new discovery, that mRNA can function as a regulator through competitively binding to microRNA in tumor biology [4], the canonical role of mRNA is to deliver the information to protein synthesis by a ribosome. An optimized metabolic pathway frequently requires more robust enzymes (higher catalytic efficiency and insensitivity to feedback inhibition of the final product) catalyzing given steps. As a commonsense in biochemistry, the property of a metabolic enzyme is determined by its structure and ultimately by its amino acid sequences. Currently, there is no available method to change the amino acid sequence of a protein once it is produced through translation by a ribosome. Therefore, any expectable alteration in the protein sequences can only be realized by changing the encoding sequence of a protein, and thus the function of the polypeptide (Not all change in the encoding sequence will produce a mutant protein due to the degeneracy in genetic codon).

In general, there are two different strategies to create improved enzymes that can be adopted in the optimization of a pathway in metabolic engineering studies: directed evolution [5, 6] and rational design [79]. In most cases, directed evolution and rational design are believed in the scope of protein engineering, which has been utilized to modify natural proteins (including enzymes) to live up to the demands of different industrial applications. Similarly, protein engineering can also play an imminent role in improvement of enzyme activity, alteration of substrate and product specificity, and modifying regulatory elements [10].

Directed Evolution

Directed evolution (or directed molecular evolution) is artificially designed operations essentially imitating the process taking place in the nature, with which an accelerated speed of evolution can be obtained [11]. It is of particular importance for preparation of industrial enzymes or therapeutic proteins with improved desirable properties in a laboratory. By employing sophisticatedly designed protocols, library comprising diverse mutant sequences from one or more starting sequences can be created. The supposal by Smith provided a theoretically possible route for evolution “functional proteins must form a continuous network which can be traversed by unit mutational steps without passing through nonfunctional intermediates’’ [12, 13], but in directed molecular evolution, exploring the wider encoding sequence space will inevitably produce a large percentage of inactive mutants. Therefore, the library created in directed molecular evolution must be subjected to activity-based screening. The mutant sequences that encode proteins with improved properties can be used in the metabolic engineering to replace the wild-type one. In the overproduction of levopimaradiene, Leonard et al. [14] metabolically engineered an Escherichia coli by introducing the mutant versions of two important metabolic enzymes functioning in the synthesis pathway of levopimaradiene: geranylgeranyl diphosphate synthase from Taxus Canadensis and levopimaradiene synthase from Gingko biloba. The activity of levopimaradiene synthase was improved by site-directed and -saturated mutagenesis with the results from some bioinformatics analysis as guidance. Another rate-limiting enzyme geranylgeranyl diphosphate synthase was enhanced with error-prone polymerase chain reaction (PCR). As a result, a significantly enhanced overproducer was obtained (2,600 times higher).

Rational Design

Different from directed molecular evolution, rational design of protein largely depends on the availability of structure–function relationship of the protein of interest. With the information, we can create any desirable mutant sequence by PCR-based site-directed mutagenesis, or chemical synthesis of the whole gene. When chemical synthesis of a gene is adopted, decision on what codon should be used for a given amino acid residue of the target protein must be made according to the preference of codon usage of specific organisms. Although this kind of optimization is more frequently done when efficient expression of a gene for better protein yield is the most desirable result, and a higher-level expression of the metabolic enzymes is not the main objective of metabolic engineering (in some cases, higher-level expression is unexpectedly deleterious), optimization according to species-specific codon usage is also profitable. Moreover, it is especially important when one attempts to construct a heterogeneous pathway from other higher species into a lower model organism [15], and when the pathway to be constructed contains genes of different origins [16].

Rational design of a protein can also be simply base on the sequence analysis of its homologous sequences. It is an effective strategy to improve some properties of a protein which usually requires the spontaneous mutations at multiple sites (e.g., solubility and thermostability) [17]. When unsatisfactory activity of an overexpressed metabolic enzyme is the consequence of poor expression level or low percentage of soluble fraction because of its heterogeneity to many widely used hosts, mutating the sites identified by multiple alignments amongst homologous sequences with the “back-to-consensus” strategy might be the first choice. By taking this route, the γ-humulene synthase from Abies grandis was engineered by redistributing the glycine and proline residues. A comprehensive mutant containing these beneficial mutations exhibited enhanced solubility, leading to an increased yield by 80-fold [18].

Codon usage influences the translation efficiency because translation of different codons on mRNA generally needs corresponding transfer RNAs (tRNA). The absence of tRNA that can recognize some “rare codon” will lead to lowered translation efficacy or total failure of translation. Meanwhile, the codon usage can also control ribosome speed, exerting a fine-tuning translation control on the efficiency of protein synthesis [1921]. In translating an mRNA, ribosome moves towards the 3′-end at a varying speed: in the first 50 codons, ribosome proceeds at a low speed (ramp stage), then at a high speed. These new findings are consistent with other discoveries that the sequence at the beginning of the gene strongly influenced translation, and that the expression level was inversely correlated with predicted mRNA secondary structure [22, 23]. The beginning region of a gene controls the initiation and/or early elongation, depending on the particular gene. At the same time, in the sequence of a gene, the tendency to using the same codon for its second appearance as the first one (a phenomenon termed as “autocorrelation”) can be beneficial for translation in some way. These understandings in the mechanism of translation may provide an effective way to speed up translation, although there has not been a successful example to apply these findings to optimization of pathway enzymes in metabolic engineering until now.

The overproduction of l-tyrosine by E. coli involves many steps catalyzed by different enzymes. In Juminaga’s effort for constructing an overproducer of l-tyrosine, they identified the bottlenecks limiting the yield through targeted proteomics and metabolite profiling. One of the bottlenecks is the production of intermediate shikimate catalyzed by dehydroquinate synthase (AroB). This enzyme expressed constitutively at a low level. Optimization of the first 15 codons of the gene alleviated the constraint exerted by this enzyme due to its poor expression [24]. This example supported the theory proposed by Cannarozzi et al. [19]. Probably, it also implies that the codon optimization of a gene for improved expression level should be concentrated on the 5′-terminal of the encoding gene (about 50 amino acid residues).

The Regulatory Components

The regulatory components control when and to what extent of a gene is expressed. Strategic utilization of regulatory components can realize the fine tuning in gene expression with minimal intervention to the host’s metabolism.

Engineering Ribosome Binding Site

A necessary step for protein synthesis by a ribosome is its binding to a sequence of three to nine nucleotide region upstream of the starting codon, the ribosome binding site (RBS), on an mRNA [25, 26]. In bacteria, this RBS interacts with a sequence near the 3′ end of the 16S rRNA through complementary pairing. The number of base-pairs involved in the interaction and the location of these base-pairs in relation to that of the starting codon dictated the translation efficiency [27]. Further studies formed the idea that the enhanced stability of the RBS region through the formation of secondary structure can negatively affect the efficiency of translation. By quantitative analysis on how the secondary structure of the RBS determined translational efficiency, researchers found that exposure of only the RBS region or the starting codon was insufficient for preferential recognition, and the translation efficiency was strictly correlated with the fraction of mRNA molecules with an unfolded RBS. The reason was that a ribosome could not recognize nucleotides outside the RBS region and the initiation codon, and structured RBS region [28].

Biologists have developed methods to control protein expression through automated design of synthetic RBSs. With the model based on free energy changes in relevant the biological events in the process of translation initiation, Salis et al. [29] developed a predictive method to design synthetic RBSs for rational control over the protein expression level. According to their experimental validation of the model with >100 predictions in E. coli, they proposed that the accurate method was very useful to connect a genetic sensor to a synthetic circuit by rationally optimizing protein expression. Different from Salis’s methodology, Na et al. [30] paid their attentions to the dynamics of mRNA folding and ribosome binding to estimate translational efficiency with mRNA sequence as the sole input. This model was successful with the manifestation of a high correlation factor (R 2 = 0.87) observed between the experimental data from expression and estimated ones when luxR mRNA derivatives were taken as an example.

Although the workability of the two above mentioned models were proven by the expression data of some proteins, successful implementations of these models to other designs towards expression control were scarcely reported. Library-based approach involved the generation of libraries of RBS sequences and screening for more robust sequences. Wang et al. [31] invented the method of multiplex automated genome engineering (MAGE). They applied their invention to the engineering of a recombinant E. coli strain (EcHW2) for lycopene production. Targeting the 20 genes responsible for lycopene synthesis, the RBS regions were modified through allelic replacements using oligos containing degenerate RBS sequences (DDRRRRRDDDD; D = A, G, T; R = A, G). The higher similarity between the replaced RBS region and the canonical Shine–Dalgarno sequence (TAAGGAGGT) gave rise to enhanced translation efficiency.

Promoter

Control on Expression Level by Adopting Strong or Weak Inducible Promoters

Promoter is a DNA fragment that can be recognized by a group of proteins responsible for initiation of transcription of a gene. Very often, more than one related gene (an operon or gene cluster) shares the same promoter. In this case, the expressions of these genes are under the control of the same promoter. Transcription is the first step of gene expression. Therefore, promoter can exert an efficient regulation on the expression level of a gene. At the same time, promoter provides the most economical control on gene activity: if the expression of a gene is unnecessary, the synthesis of its mRNA should also be avoided (despite the recent discovery that mRNA can also play a role in regulating gene expression, besides its basic function as the template for protein synthesis [4]). As a result, it has been a routine practice to obtain different expression levels of the target gene by adopting different promoters [32, 33], especially in metabolic engineering [34, 35]. Some frequently utilized promoters are lac, tac, ara, and so on. These promoters are strong promoter and are inducible.

Beyond these well-known promoters with different strengths, achievement of different expression levels of a gene frequently involves the screening a promoter library [3639] if the promoter is selected as the control point. A further improvement of the library method should be the ability to select the ideal mutant promoter according to strength comparison between the promoters in the library and the native one with the promoter strength predictive model [38]. De Mey et al. [38] introduced a simpler and more efficient methods to insert an artificial or replace a native promoter in E. coli, making it more practical to control gene expression with promoter.

Auto-responsive Promoter

Auto-responsive promoters are virtually inducible. Different from other generally recognized inducible promoters, the induction of these promoters does not require the addition of certain inducible chemicals. These promoters response to molecules which is either one of the integral components of the environment (e.g., oxygen [40]) or the metabolites produced by the microorganisms accompanying their growth (e.g., signal molecules in the quorum sensing system (QS) of a bacterium [41]). The unique response profile of these promoters can be strategically adopted to design system to realize automatic induction when the expression of target genes is necessary.

QS-based Expression System. QS is a process that coordinates the population behavior of bacteria to act as an integrity by sensing a strain-specific small molecule [42, 43]. With QS, bacteria can specifically probe the presence of other individuals and the density of the population [44]. Bacteria can dynamically turn on (up)/off (down) the expression of many genes by response to the auto-inducers (AI) whose concentration is proportional to the density of bacterial cells [45].

By employing this mechanism, authors have succeeded in producing recombinant proteins by autonomous induction through minimally rewiring native quorum sensing regulon of different strains, E. coli W3110 and BL21 [41]. The author tested their novel expression system with the overexpression of some model proteins: green fluorescent protein (GFP), chloramphenicol acetyl transferase (CAT), and β-galactosidase (LacZ). In this system, the expression of heterogeneous gene is under the control of autoinducer-2 (AI-2) (a small molecule whose concentration is closely correlated to the density of the cultured cells), enabling the automatic initiation of expression of target gene when the cell density is high enough. No manual operation (such as inducer addition and temperature shifting) is needed. Meanwhile, since the expression of target gene will and can only occur when the cell density reaches a certain value, the optimization of induction initiation will be avoided. It also eliminates the laborious continual monitoring of the bacterial growth to start the induction in every batch.

Oxygen-induced Expression of vhb. In most cases of bioindustrial process, cell growth is markedly influenced by dissolved oxygen (DO). Insufficient supply of DO will frequently constitute a big problem, especially when cell density is very high in aerobic fermentation. Discovery of the bacterial hemoglobin from Vitreoscilla sp. (VHb) [46] and its strong ability to binding DO provides an effective solution to alleviate the DO deficiency. Besides the oxygen-binding capability of VHb, the expression of vhb in Vitreoscilla sp. is also regulated by oxygen. Constructing genetic unit by putting the vhb gene under the control of its native promoter from Vitreoscilla sp. can realize the “intelligent” and autonomous induced expression of vhb when DO is limited, sophisticatedly avoiding the waste of cellular materials due to unnecessary expression when DO is adequate. This conditional expression is quite different from and advantageous than other cases in which the expression of genetically introduced genes is almost “constitutive” if the expression starts. The vhb promoter (P vhb ) has been successfully utilized in the expression of GFP protein and toluene dioxygenase with high copy-number plasmid by Liu et al. [47], and the combined application of P vhb with VHb gives rise to a better productivity under low aeration. This energy-saving production process of protein is of special significance in industrial-scale operations. Integration of VHb gene and the P vhb promoter into metabolic engineered strain as an ancillary component will bring it more industrial merits. The vgb gene has been used in improved production of recombinant proteins [48, 49], chemicals [50, 51], and antibiotics [52]. It could also enhance the performance of microbes in bioremediation [53], and improve the physiological state [54]. A comprehensive introduction to the vhb’s application in metabolic engineering can be found in a review [55].

Light Responsive Promoter

The activation of a conventional inducible promoter depends on the existence of a chemical molecule (inducer) added into the medium or produced by the cells themselves, or heat shock. Once the expression of a gene is initiated, it cannot stop until depletion of the inducer: these promoters are not reversible. They are very simple and successful (although they carry inherent problems [56]); at same time, they are very suitable for expression that requires no time and space control. In 2002, Shimizu-Sato et al. [56] introduced the light induced expression of some reporter genes based on the phenomenon that the binding of the plant photoreceptor phytochrome to the protein PIF3 could be induced by red light, and dissociated by far-red light. In 2011, a green light inducible transcription system in E. coli based on green/red photoswitchable two-component system from cyanobacteria was reported by Tabor [57]. Light inducible expression system could provide rapid, noninvasive switchable control on gene by exposure to the light; it allows achieving multichromatic control of gene expression, realizing multistate control in the scale of the time and space [58]. This powerful light-controlled promoter enriches the choices of promoter in metabolic engineering and synthetic biology in which the coordinated expression of several genes are extremely expected, see reviews [58, 59].

Riboswitch Control

Riboswitch is a part of an mRNA molecule that can directly bind a small target molecule, and the binding can affects the gene’s activity [6062]. Therefore, mRNA containing a riboswitch can directly regulate its own activity by responding to the concentrations of its target molecule. The existence of riboswitches in all domains of life therefore adds some supports to the RNA world hypothesis. With the increasing know-how in detailed mechanism of riboswitch, some scientists have employed riboswitch to design finely regulated biosystems. In a recent interesting article, the author reported that they successfully modified bacterium that could seek and destroy controversial herbicide atrazine by controlling the translation of flagellar motor complex with a synthetic riboswitch [63].

The first use of a synthetic riboswitch to control bacterial mobility was reported by Topps and Gallivan [64]. They engineered an E. coli to follow a small molecule theophylline by taking advantage of the favorable feature of riboswitch that it can control gene expression in ligand-dependent and protein-coeffector-free fashion [65, 66]. The independence of accessory protein in riboswitch regulating gene expression makes it possible to construct regulatory riboswitch system capable of controlling gene expression in diverse bacterial species. Topp et al. [67] succeeded in developing five synthetic riboswitches that efficiently induced gene expression in eight different bacterial species (Gram-negative and positive). Notably, with these riboswitches, they realized the induced gene expression in conditional human pathogen Acinetobacter baumannii, in which successful inducible expression of foreign gene has never been reported. More detailed information on biotechnological relevance of riboswitch can be found in two good reviews [68, 69].

Operator Region

The function of an operator region in an operon or a gene has been well recognized by the example of lac operon [70]. In the lac operon, the operator region can control the transcription of downstream genes by binding the repressor protein lacI to block the proceeding of a RNA polymerase. Similarly, in other operons/genes that contain such an element, operator region also plays an important role in regulating gene expression. However, operator region as an operating target is not popular in literatures related metabolic engineering. It is more frequently adopted to design an inducible gene expression system (e.g., the pET and derivative expression system). Stronger affinity between the operator region and repressor protein will minimize the background expression level of a gene by giving a more stringent control on its expression when the corresponding repressor is absent. It can be regarded as an alternative strategy for the widely adopted method for minimized background expression level of target gene by increasing the amount of repressor protein.

Intergenic Region

The human genome is larger than the sum of all the encoding sequences (including the generally recognized necessary regulatory elements) [71]. It is also the case for other lower living organisms (bacteria, fungi, plants, and other animals), despite the extensive existence of overlapped genes [72, 73]. This phenomenon implies there are many intergenic sequences whose functions are far from being thoroughly understood. Fortunately, the regulatory functions of these intergenic regions in gene expression have been discovered, to some extent [7476]. Pfleger et al. [77] invented a method to realize the balanced expression of multiple genes as operons by the combinational utilization of various post-transcriptional control elements and tunable intergenic region (TIGR) screened from a library. They first tested the effect of an intergenic region on expression of two reporter genes and a change of 100-fold was observed. Due to the balanced expression of the genes in the heterologous biosynthetic pathway, a sevenfold productivity of mevalonate was achieved. Park et al. [78] paid their attentions to 5′-untranslational region (5′-UTR) to tune the gene expression level, because the 5′-UTR contained sequences that influenced protein synthesis in a more direct way. In order to lower the background level of expression controlled by broad-host-range promoter Pm, Lale et al. [79] constructed and screened a library of the 5′-UTR, they observed that mutations in the UTR DNA region flanking the Shine–Dalgarno sequence strongly reduced the background expression level from Pm by reducing translational efficiency, while the response to induction remained unchanged (the allowed minimal concentration of inducer was 1 μM).

Transcriptional Terminator

Compared with the attentions and efforts paid to promoter in metabolic engineering, the transcriptional terminator is apparently neglected. Transcriptional terminator is a segment of DNA sequence whose function is to define where the transcription of a gene or an operon should be stopped through different mechanisms [80].

Different transcriptional terminations will lead to the different version of transcripts, offering a powerful regulatory mechanism on expression of gene. In addition to its primary function to terminate a transcription, transcriptional terminator was found to stabilize its own mRNA [81]. Engineering transcriptional terminator might also be a target that can be used to regulate gene expression in bacterial metabolic engineering studies, because a similar result was confirmed in yeast [82]: a higher-level mRNA and protein of the cloned gene in a plasmid was achieved when TPS1 terminator (TPS1t) was selected from the four studied terminators CYC1t, TDH3t, and PGK1t.

Concluding Remarks

It is a quite straightforward idea to alter the metabolic pathway of an organism to a direction that is expected by the human being. To fulfill this task, at an earlier time, random mutations were introduced with the aid of chemomutagen or physical treatment (e.g., the most routinely used UV irradiation); combined with subsequent screening, expected mutants might be obtained. In most cases, except for the production of recombinant protein, a better productivity is frequently the results of a changed pathway. Therefore, such kind of operation is essentially metabolic engineering, at least “irrational” one. Now, we can do the same job in a quite different and more rational way due to the increased understanding in metabolic pathways in many microorganisms. The functional elucidation of the individual part of a gene (the dictator of enzymes that defines a metabolic pathway) enables us to find the suitable targets that can be changed to alter the metabolic pathway. We can change the property of a metabolic enzyme by changing its encoding sequence through directed evolution or rational design, and control the time scale and strength of expression of the gene by a different promoter (strong or weak inducible promoter, auto-responsive promoter). The sophisticated mechanisms in quorum sensing and VHb functioning can be used to control gene expression in an automated way. Newly discovered and universally occurring riboswitch provides the most efficient and economical control strategies in gene expression at the transcriptional level across different species. In an operon of more than one gene, the intergenic region can also play an important role in regulation of the expression of all the member genes in it. These offer more targets that can be used to metabolically engineering microbes into a better biotechnology workhorse.