Keywords

1 Introduction

Bulk chemicals are produced at high volume and relatively low cost, which can be either directly used or used as platform chemicals for production of derivatives in the chemical industry [1, 2]. The annual production volumes of bulk chemicals are usually in the range of 1–100 million tons, and their selling prices are less than 2,000 dollars per ton [2]. Traditionally, bulk chemicals are predominantly produced from non-renewable fossil resources via the petrochemical routes [1, 3]. Because of the decrease of global storage, fluctuations of petroleum prices, trade imbalances, and political considerations, the cost for production of bulk chemicals through petrochemical process is increasing [1, 410]. In addition, petrochemical processes always consume a lot of energy and cause serious environmental pollutions [915].

Production of bulk chemicals from biomass resources by microbial cell factories is an alternative route, which is renewable and environment friendly compared to petrochemical route [11, 1618]. However, only a few bulk chemicals can be produced by microbial cell factories. On the other hand, although some bulk chemicals can be synthesized by engineered microorganism, the producing capabilities, including titer, yield, productivity, and physiological characteristics, are not good enough to compete with petrochemical routes. Thus, it is important to expand the product range, as well as to improve producing capabilities of cell factories to decrease producing cost for commercialization.

The rapid development of metabolic engineering, systems biology, and synthetic biology has facilitated construction of microbial cell factories for producing bulk chemicals [1, 16, 19, 20]. Escherichia coli is the most commonly used host strain for cell factories construction since it has clear physiological and genetic characteristics and can be genetically modified easily [3, 21]. E. coli also grows fast in minimal salt medium and can utilize both hexose and pentose in the biomass. Currently, bio-based bulk chemicals produced by E. coli include not only those formed by E. coli native metabolic pathways such as lactate [2225] and succinate [2629], but also those produced by heterologous pathways or totally new synthetic pathways, such as 3-hydroxypropionic acid [3033], 1,3-propanediol [2], isobutanol [13, 3436], butanol [13], 1,4-butanediol [37], and alkanes [38, 39]. In this review, we will summarize the technology platforms for construction and optimization of E. coli cell factories, as well as representative cases of constructing E. coli cell factories for production of organic acids and alcohols.

2 Technology Platform for Construction of E. coli Cell Factories

With the developments of systems biology and synthetic biology, a technology platform has been established for the construction of E. coli cell factories for bulk chemicals production. This platform includes (1) design of the optimal synthetic pathway, (2) construction of the synthetic pathway, (3) optimization of the synthetic pathway, (4) optimization of the producing capability at the whole cell level, and (5) characterization of the genetic mechanisms (Fig. 1). An initial cell factory can be obtained after the first four steps. The genetic mechanisms identified for high production can be used to further improve the producing capabilities to construct the next-generation cell factories.

Fig. 1
figure 1

The technology platform for the construction of E. coli cell factories

2.1 Design of the Optimal Synthetic Pathway

E. coli produces mixed acids during the glucose metabolism. In order to produce the target compound, other competitive pathways need to be inactivated. Enough energy supply is necessary to maintain cell growth and metabolism, while appropriate reducing equivalent supply is required to keep redox balanced, especially under anaerobic conditions. On the other hand, most synthetic pathways that convert the designated substrate to target product do not exist in E. coli, and sometimes are even not present in nature. Thus, designing the novel synthetic pathway is very important for construction of cell factories. With the development of bioinformatics tools, several genome-scale metabolic network models have been reconstructed for E. coli [4043]. These models can help design the optimal synthetic pathway, discover new engineering target to improve production of target compound, and predict the cellular phenotypes [41]. There have been several reviews describing the developments of metabolic network models and their applications in constructing E. coli cell factories [4449].

With the help of metabolic network models, the optimal synthetic pathway can be designed based on modification of the native metabolic pathways (such as succinate or d-lactate), integration of exogenous reactions by software predication (such as 1,4-Butanediol [12]), genome mining (such as alkanes [20]), and modification of the natural pathway to catalyze unnatural reactions (such as higher-chain alcohols [50, 51]). Different tools used for predication of novel biosynthetic pathways have been reported [5255]. These tools can not only propose candidate pathways but also supply ranking of the pathways based on different factors employed in the process (such as thermodynamics [56]) to reduce the numbers of pathways to be a reasonable scope for experimental validation [16].

2.2 Construction of the Synthetic Pathway

Gene resources and DNA assembly method are two key factors for construction of synthetic pathway. It is desirable to obtain genes in an easy, quick, and inexpensive way, and to assemble different genes into a complete synthetic pathway with efficient and standard methods.

PCR has been commonly used to obtain target genes. However, original cells having the target genes need to be collected first, which is time-consuming. In addition, many heterologous genes cannot be expressed and translated efficiently in the host strain. With the rapid development of high-throughput chemical synthesis of DNA, achieving gene resources has become independent on original cells. It is also possible to optimize the transcriptional efficiency of target genes by codon optimization. The Church’s group developed a microchip-based technology for synthesis of genes and reduced the error rate by ninefold [57]. Using this technology, all 21 genes encoding the proteins of E. coli 30S ribosomal subunit were synthesized and the translation efficiency in vitro was optimized through alteration of codon bias. An on-chip gene synthesis technology, including ink-jet printing, isothermal oligonucleotide amplification, and parallel gene assembly, was integrated on a single microchip [58]. By using a mismatch-specific endonuclease for error correction, the error rate was reduced to about 0.19 errors per kb.

Standardized DNA assembly is another limiting technology for construction of the synthetic pathway. Several BioBrick assembly standards [59] have been developed. The DNA unit is flanked by standardized sequences, and the assembly can be achieved by a simple and standardized restriction/ligation method [60, 61]. However, the BioBrick approach still has several disadvantages, such as the remaining of 6-bp scars resulting from each binary BioBrick assembly and the limitation of rearrangement of every intermediate part [62]. Several new technologies, such as sequence- and ligation-independent cloning (SLIC), Gibson isothermal assembly, and circular polymerase extension cloning (CPEC), have been designed, which can supply standardized, scarless, sequence-independent, and multi-part DNA assembly [6265]. All these technologies are dependent on the 5′ homology sequence flanked at the two ends of DNA part. The biological characteristics and mechanisms for these methods have been reviewed [61].

All methods mentioned above are carried out in vitro. The Zhao group at the University of Illinois at Urbana-Champaign developed a DNA assembly method in vivo [66, 67]. DNA parts with homologous sequences were transformed into Saccharomyces cerevisiae and assembled based on the high homologous recombination efficiency of the yeast. The assembled DNA devices were then extracted and transformed into E. coli for evaluation or expression. This method is efficient and independent on enzymes in vitro.

DNA integration into chromosome has also been developed to construct a genetically stable strain for industrial production. Homologous recombination based on the λ Red recombinase has been developed [6870]. The one-step homologous recombination method uses an antibiotic marker for selection, which is flanked by two FLP recognition target (FRT) fragments. The antibiotic marker can be removed by the FLP recombinase, facilitating multiple rounds of genetic engineering. This method can integrate or delete genes quickly but leave a 68-bp FRT scar on chromosome each time, and repeated use of this system has the potential to result in large unintended chromosomal deletions [68, 69, 71]. To facilitate sequential gene manipulations, a two-stage recombination strategy was developed, which was based on the sensitivity of E. coli to sucrose when levansucrase (sacB) is expressed in cell. In the first recombination, the target chromosomal genes are replaced by a DNA cassette containing an antibiotic marker and the sacB gene. In the second recombination step, the antibiotic marker and the sacB gene are removed by selection for the resistance to sucrose [69, 7274].

2.3 Optimization of the Synthetic Pathway

The native and new constructed synthetic pathways are always not efficient. Some enzymes might have low activities and become the rate-limiting steps for the whole pathway. Some toxic intermediates might accumulate in the cell, thus leading to decreased cell growth and flux imbalances. In order to improve the producing capability, several strategies have been developed to optimize the synthetic pathway in three levels.

2.3.1 Optimization of the Synthetic Pathway in a Single Gene Level

Gene overexpression was commonly used to increase the activities of rate-limiting enzymes among the synthetic pathway for the purpose of modulating metabolic fluxes. However, this simple overexpression strategy rarely reached the optimal transcript level and appeared to be unsuccessful in most cases to improve producing capability [75]. Promoter library has been developed as a solution to this all-or-nothing expression strategy, which could provide variable promoters with a wide range of strength for fine-tuning of gene expression [7680]. Two methods have been developed for creating promoter library. One type was obtained by keeping the conserved −35 and −10 sequences intact and randomizing the surrounding nucleotides [78, 79]. The other type was obtained by mutating the sequence of an existing promoter using error-prone PCR [76, 77].

However, plasmid-based gene expression has several disadvantages for the engineering of genetically stable strains [69]. Plasmid maintenance is a metabolic burden on the host cell, especially for high-copy number plasmids [81], and only few natural unit-copy plasmids have the desirable genetic stability [82]. In addition, only low-copy number plasmids have replication that is timed with the cell cycle, and thus, it is difficult to maintain a consistent copy number in all cells [82]. It is thus desirable to integrate the target genes into chromosome followed by fine-tuning of their expression. With the aid of Red recombination technology [68], promoter libraries were recently constructed directly in the chromosome [8385], which might be more suitable for modulation of gene expression directly in the chromosome.

Different promoters with varied strengths can be used to control gene transcription precisely to obtain a specific cellular phenotype. One example is the divergence of biomass yield by modulation of phosphoenolpyruvate carboxylase (ppc) gene transcription level. When the wild-type promoter of ppc was replaced by promoters with varied strengths, there was a positive correlation between the ppc transcription level and the biomass yield when the ppc transcription level was within a certain range. Excessive ppc transcription level led to decreased biomass yield. The promoter library facilitated identifying the optimum transcription level of ppc for biomass yield. Another example is the use of promoter library to investigate the relationship between succinate production and PPC or phosphoenolpyruvate carboxykinase (PCK) activity. There was a positive correlation between PCK activity and succinate production. In contrast, there was a positive correlation between PPC activity and succinate production only when PPC activity was within a certain range. Excessive PPC activity decreased the rates of both cell growth and succinate formation [29]. In contrast, plasmid overexpression of ppc gene always led to increased succinate production [86], which would mislead our understanding of the relationship between succinate production and PPC activity.

2.3.2 Optimization of the Synthetic Pathway in Multiple Gene Levels

The efficiencies of most synthetic pathways are always not limited by a single rate-limiting reaction [87]. A more broadly accepted opinion is to realize the coordinated expression of multiple genes involved in the synthetic pathway to increase the overall metabolic flux.

A technology for tuning the expression of multiple genes by employing post-transcriptional mechanisms was developed by the Keasling group at the University of California, Berkeley [88]. Libraries of tunable intergenic regions (TIGRs) consisting of several control elements composed by mRNA secondary structures, RNase cleavage sites, and the RBS sequence were constructed and used to differentially change the processes of transcription termination, mRNA stability, and translation initiation [87, 88]. When using this strategy to balance expression of three genes in an operon encoding a heterologous mevalonate biosynthetic pathway, a sevenfold increase of mevalonate production was achieved. Another technology for fine-tuning pathway flux was developed by the same group in 2009 [89]. Synthetic protein scaffolds was built to spatially recruit metabolic enzymes in a specific manner to increase the valid concentration of metabolic intermediates and avoid their accumulation to toxic level. Also, the production levels can be optimized by balancing relative quantities of individual enzymes via changing the number of interaction-domain repeats that locate different enzymes to the synthetic complex. Using this technology, a 77-fold improvement was achieved for the mevalonate biosynthesis.

The Church group at Harvard Medical School developed a powerful tool termed multiple automated genome engineering (MAGE), which can modify many genes in the E. coli genome in parallel. This technique was used to optimize the 1-deoxy-d-xylulose-5-phosphate (DXP) synthetic pathway in E. coli for improving lycopene production. Twenty-four genes in the DXP pathway were modified simultaneously, and over 4.3 billion variants were created per day. E. coli variants with more than fivefold increase in lycopene production were isolated within 3 days [90, 91].

Instead of repeating multiple rounds of gene knockout, synthetic regulatory small RNAs (sRNAs) were designed to finely control gene expression in E. coli by the Lee group at Korea Advanced Institute of Science and Technology. Customized synthetic sRNAs were consisted by a scaffold and a target-binding sequence. With utilization of the plasmid-based synthetic sRNA system, one can study the effects of multiple knockdowns on the cell’s producing capability in a high-throughput way and simultaneously screen target genes in different E. coli strains [92].

2.3.3 Optimization of the Synthetic Pathway Using the Sensor–Regulator System

To precisely control and regulate the heterologous pathway expression due to the change of environment or intracellular conditions, an efficient strategy is to use the sensor–regulator system which can respond to a particular intermediate and stimulate the desired cellular response to enable the cell to efficiently use the cellular resources and improve the producing capability while decrease the accumulation of toxic metabolite [93]. Malonyl-CoA is the rate-limiting precursor involved in the synthetic pathway of several value-added pharmaceuticals and biofuels. By incorporating the trans-regulatory protein FapR and the cis-regulatory element fapO of Bacillus subtilis, a hybrid promoter–regulator system was constructed and could respond to a wide range of intracellular malonyl-CoA concentrations in E. coli [93]. In another study, the Liao’s group designed and engineered a regulatory circuit by recruiting and altering the Ntr regulon which is a global regulatory system to control the pathway expression for lycopene synthesis in E. coli. The artificially engineered regulon controlled the gene expression in the lycopene synthetic pathway by sensing the concentration of acetyl phosphate which is the glycolytic pathway hallmark metabolite [94]. Recently, a dynamic sensor–regulator system (DSRS) was developed to dynamically regulate the gene expression in biodiesel biosynthetic pathway by responding to the key intermediate fatty acyl-CoA in E. coli. Using this strategy, the fatty acid ethyl ester production was increased threefold compared to that of using constitutive promoters in E. coli [95].

2.4 Optimization of Producing Capability at the Whole Cell Level

After optimization of the synthetic pathway, the producing capability of the engineered cell, such as titer, yield, productivity, and physiological characteristics, might still be not good enough for industrial application. The desired cell phenotypes may be affected by factors which are not directly related to the synthetic pathway [96]. In order to obtain an efficient cell factory, the producing capability needs to be optimized further at the whole cell level.

Metabolic evolution was developed by the Ingram group at the University of Florida and has been demonstrated to be an excellent strategy for strain improvement [69]. Synthesis of target product is designed to be the only fermentation pathway to oxidize NADH under anaerobic condition. The cell growth of the engineered cell is coupled with the synthesis of target product, since this is the only way to regenerate NAD+ for continuous glycolysis to provide ATP for cell growth. This technology has been used widely to improve the production of several bulk chemicals by E. coli cell factories, such as d-lactate [9799], succinate [100102], and ethanol [103]. In addition, this technology can also be used to improve cell’s physiological characteristics, especially tolerance to toxic metabolites or high concentration of target products [104, 105].

Global transcription machinery engineering, or termed gTME, was developed by the Stephanopoulos group at Massachusetts Institute of Technology and has been proved to be a powerful strategy for optimization of a desired phenotype at system level [106109]. For optimization of a desired phenotype at systems level, gTME has been used to improve ethanol tolerance, lycopene production, and simultaneous tolerance to sodium dodecyl sulfate (SDS) and ethanol. To realize these purposes, one of the components of global cellular transcription machinery (specifically rpoD encoding the σ70) in E. coli was engineered to globally perturb the transcriptome to help unlock complex phenotypes [109].

Aside from technologies mentioned above, some other tools, such as genome shuffling [110113] and trackable multiplex recombineering (TRMR) [114], have been designed and used to optimize a target pathway at system level. Great improvement for the chemical production properties has been achieved based on these strategies. It should be noted that high-throughput screening methods are required for the efficient selection.

2.5 Characterization of the Genetic Mechanism

Although metabolic evolution or other global perturbation methods are efficient for improving the producing capability of the engineered cell, the genetic backgrounds of strains obtained by these strategies often remain unclear. Characterization of the genetic mechanisms relative to the improved producing capability is very important. The fast accumulation of omics data, including genomics, transcriptomics, proteomics, metabolomics, and fluxomics, has provided foundation for the understanding of the genetic mechanisms in depth [115119], which is crucial for further round of engineering to obtain the next-generation cell factories.

Up to now, many E. coli cell factories with abilities of producing different bulk chemicals have been constructed and some have been applied in industrial scale. The bio-based bulk chemicals produced by E. coli cell factories mainly include organic acids and alcohols, which will be described in detail in the following chapters.

3 Organic Acids

Organic acids have received attractive attentions for their increasing utilization in food industry and great potential as platform chemicals for the manufacture of biodegradable polymers [120, 121]. As an alternative of petroleum-based production, microbial production of organic acids from renewable biomass has been accepted as a feasible process. E. coli has been widely engineered to produce organic acids, such as acetate, lactate, pyruvate, 3-hydroxypropionate, succinate, malate, fumarate, glucaric acid, and muconic acid (Table 1).

Table 1 Production of organic acids by representative E. coli cell factories

3.1 d-lactate

As a specialty chemical, d-lactate is widely applied in the food and pharmaceutical industry. A potential huge market for d-lactate is to be combined with l-lactate to produce polylactic acid (PLA), an increasingly attractive biodegradable plastic. The commercial success of PLA will greatly depend on the production cost of the monomers [69]. Wild-type E. coli can produce d-lactate in its mixed acid fermentation process (Fig. 2). However, the productivity is low and several undesirable metabolites are produced at the same time. To realize the production of d-lactate in an efficient way, it is necessary to reengineer the metabolic network of E. coli.

Fig. 2
figure 2

Construction of E. coli cell factories for the production of d-lactate. The dottled line and star indicate metabolic reactions which were inactivated to increase production of d-lactate

d-lactate-producing strains were engineered from E. coli W3110 by the Ingram’s group by inactivating the competitive fermentation pathways, including fumarate reductase (frdABCD), alcohol/aldehyde dehydrogenase (adhE), and pyruvate formate lyase (pflB). A further deletion of the acetate kinase gene (ackA) increased the cell mass and lactate productivity. D-lactate production yield of these strains approached the theoretical maximum yield (2 mol/mol glucose) using mineral salts medium [122]. For expanding the substrate range, a cluster of sucrose utilization genes which were characterized and cloned from E. coli KO11 were introduced, resulting in production of over 500 mM D-lactate from sucrose [123]. However, these biocatalysts were unable to ferment glucose or sucrose with concentration of up to 10 % completely. Inspired by the construction of ethanol producing strain, a derivative of E. coli B was selected as the starting strain for d-lactate production. Based on the growth-based selection (Fig. 3), metabolic evolution was carried out to improve strain performance. The resulting strain SZ194 produced 1.22 M d-lactate with a yield of 1.9 mol/mol using mineral salts medium. The production capability was comparable with lactic acid bacteria [98].

Fig. 3
figure 3

Metabolic evolution based on the coupling of cell growth and d-lactate production

In another study, E. coli strain B0013 was engineered for d-lactate production by deletion of acetate kinase and phosphotransacetylase (ackA-pta), phosphoenolpyruvate synthase (pps), pflB, FAD-binding d-lactate dehydrogenase (dld), pyruvate oxidase (poxB), and adhE and frd genes. The resulting strain, B0013-070, produced 125 g/L d-lactate [25]. Replacing the ldhA promoter with the λ p R and p L promoter in strain B0013-070 led to a thermocontrollable strain B0013-070B in which the LDH activity was twofold higher than the parent strain B0013-070 at 42 °C. When the culture temperature reached to 33 °C, the genetic switch would be turned off and strain B0013-070B produced 10 % more biomass under aerobic conditions than stain B0013-070 with trace d-lactate produced. This modification reduced the growth inhibition which was caused by oxygen insufficiency in large-scale fermentation process [23].

3.2 3-Hydroxypropionate

3-Hydroxypropionate (3HP), a non-chiral carboxylic acid, has received much attention for its potential applications to produce biodegradable polymer by itself or with other compounds [124, 125]. Additionally, 3HP was an important C3 platform chemical and can be used for the production of various commercially valuable chemicals, such as 1,3-propanediol, acrylic acid, and malonic acid [126]. 3HP has been identified as a metabolic intermediate naturally present in several microorganisms [127134]. More than a dozen of pathways for 3HP biosynthesis have been proposed based on the natural metabolic pathways or in silico design [135137]. However, a little fraction of the pathways have been evaluated. The Park’s group developed a recombinant E. coli strain producing 3HP from glucose involved malonyl-CoA as an intermediate. In this strain, a mcr gene encoding the NADPH-dependent malonyl-CoA reductase (MCR) of Chloroflexus aurantiacus DSM 635 was introduced into E. coli. The recombinant strain produced 0.064 g/L 3HP when cultivated aerobically for 24 h using glucose as the sole carbon source. To improve the 3HP production, the gene cluster accADBCb encoding the acetyl-CoA carboxylase and biotinilase of E. coli K-12 were overexpressed and this resulted in a twofold improvement in 3HP production. Further genetic modification was carried out to express the gene pntAB encoding the membrane-bound transhydrogenase to convert the NADH to NADPH which increased 3HP titer to 0.193 g/L [33].

Compared to producing 3HP from glucose, more studies have been focused on the production of 3HP from glycerol. By heterologous overexpression of the glycerol dehydratase (DhaB) from Klebsiella pneumobiae DSM 2026 and aldehyde dehydrogenase (AldH) from E. coli K-12 MG1655 in E. coli BL21 (DE3), a recombinant E. coli strain SH254 was obtained. When fermented aerobically in M9 minimal medium supplemented with glycerol as substrate in shake flask, this strain produced 0.58 g/L 3HP with a yield of 0.48 mol/mol glycerol [138]. Further optimization of the fermentation parameters, such as pH, IPTG concentration, aeration rate, and substrate concentration, led to production of 31 g/L 3HP in 72 h when a fed-batch fermentation process was used [139]. Though the titer of 3HP was improved by optimization of the fermentation parameters, several problems, including the imbalance between DhaB and AldH and instability of DhaB, were still not solved. To overcome these limitations, DhaB and AldH were overexpressed in two compatible plasmids with inducible expression systems and the glycerol dehydratase reactivase (GDR) was expressed at the same time. Then, by using α-ketoglutaric semialdehyde dehydrogenase (KGSADH) from A. brasilense to replace the AldH, a recombinant E. coli strain SH-BGK1 was constructed which produced 38.7 g/L 3HP aerobically using a fed-batch process [140]. Modulation of glycerol metabolism further increased 3HP titer to 57.3 g/L with a yield of 0.88 g/g glycerol [30]. In another work, based on in silico simulation, two genes tpiA (encoding triose phosphate isomerase) and zwf (encoding glucose 6-phosphate dehydrogenase) involved in the central metabolism and yqhD gene (encoding NADPH-dependent aldehyde reductase) involved in the biosynthetic pathway of the major by-product 1,3-propanediol were identified as the engineering targets to improve 3HP production from glycerol. Deletion of these three genes led to 7.4-fold increase of 3HP titer compared to the parent strain [141].

3.3 Succinate

Succinate can be produced by native E. coli as a minor product [56]. In order to produce succinate as the sole product, other competitive pathways need to be eliminated. Strain NZN111 is an engineered E. coli, which has lactate dehydrogenase (ldhA) and pflB inactivated [142144]. This strain produces undetectable lactate and formate under anaerobic conditions [142, 143]. However, inactivation of these NADH-consuming pathways could also cause redox imbalance within cells, thus leading to decreased cell growth and glucose utilization [143]. Strain NZN111 consumed only 1.8 g/L glucose and produced 1.8 g/L succinate under anaerobic conditions for 44 h [145]. A mutant strain of NZN111 (strain AFP111) was isolated, which recovered cell growth and had increased succinate production under anaerobic conditions. A spontaneous mutation in ptsG gene [146148] of NZN111, which encodes EIIBGlc subunit of phosphoenolpyruvate (PEP): carbohydrate phosphotransferase systems (PTS), was identified to be responsible for the increased cell growth and succinate production [143]. This ptsG mutation could enhance PEP precursor supply for succinate synthesis, as well as alleviating glucose repression to the expression of several genes which are crucial to the fermentation. Several genetic manipulations were performed to further improve succinate production of strain AFP111. For instance, overexpression of pyruvate carboxylase (PYC) gene of Rhizobium etli in AFP111 increased succinate titer and yield to 99.2 g/L (841 mM) and 1.1 g/g (1.68 mol/mol), respectively, under dual-phase conditions [149].

Under anaerobic conditions, 1 molecule glucose produces 2 molecules NADH through glycolysis, while the production of 1 molecule succinate requires two NADH through the reductive TCA pathway. The maximal succinate yield is only 1 mol/mol glucose, which is much less than the theoretical maximum yield (1.71 mol/mol) [150153]. In comparison, NADH requirement for succinate synthesis decreases when glyoxylate shunt pathway is utilized. It was calculated that 1.25 molecules NADH was required to synthesize 1 molecule succinate [150]. Glyoxylate shunt pathway is composed of isocitrate lyase (encoded by aceA) and malate synthase (encoded by aceB) [154]. In the presence of glucose, aceBAK operon is strictly repressed by IclR regulator [155157], and iclR deletion was proven to efficiently activate glyoxylate shunt pathway [155]. Sanchez et al. found that deletion of iclR in strain SBS550MG (∆adhE, ∆pflB, ∆ack-pta) harboring pyruvate carboxylase from Lactococcus lactis increased its succinate yield to 1.61 mol/mol [150]. Obtaining NADH through formate dehydrogenase is another strategy to improve succinate yield [158]. Blazer et al. reported that overexpression of heterologous NAD+-dependent formate dehydrogenase from Candida boidinii increased succinate yield to 1.74 mol/mol glucose [158]. External formate supplementation further resulted in 6 % increase in succinate yields [158].

Although plenty of successes were obtained in metabolic engineering of E. coli to improve succinate production as mentioned above [143, 145, 149, 150, 158, 159], there were several problems that remained to be improved. Many research groups have employed dual-phase fermentation for succinate production, i.e., aerobic growth phase followed by anaerobic fermentation phase [149, 150]. Part of the carbon source is converted to cell mass and carbon dioxide during the aerobic phase, which leads to decreased succinate yield. Supply of dissolved oxygen also increases the energy costs during industrial production. In addition, rich medium is frequently used for fermentation [149, 150], which would increase material costs and downstream purification costs. It is very important to use mineral salts medium and one-step anaerobic process for succinate production. By combining metabolic engineering to inactivate competitive fermentation pathways and metabolic evolution to improve cell growth and succinate production, a high-succinate-producing strain KJ073 was obtained by the Ingram’s group which produced 668 mM succinate with a yield of 1.2 mol/mol using mineral salts medium and one-step anaerobic process [160]. The genetic mechanisms for efficient succinate production of strain KJ073 were further identified [101]. PCK activity was increased due to a G-to-A transition at −64 position relative to the ATG start codon of pck, which increased the energy supply for cell growth and succinate production under anaerobic condition [101]. In addition, a frame-shift mutation in ptsI gene, which encodes the EI component of PTS system [161], was also found in KJ073, which increased PEP precursor supply for succinate production [101]. Reverse metabolic engineering was performed to verify the effects of these two core mutations. After increasing PCK activity and deleting ptsI gene in wild-type E. coli ATCC 8739, succinate titer and yield increased 3.7- and 4.6-fold compared with parent strain, respectively [162].

Besides inactivating competitive fermentation pathways, increasing energy supply, and increasing precursor supply, the fourth key factor for efficient succinate production is increasing reducing equivalent supply. As mentioned above, activating glyoxylate bypass and recruiting formate dehydrogenase could increase reducing equivalent supply [150, 158]. In addition, two reducing equivalent conserving pathways were identified recently, which could increase succinate yield [163]. By combining metabolic engineering and metabolic evolution, a high-succinate-producing strain HX024 was obtained (Fig. 4), which produced 813 mM succinate with a yield of 1.36 mol/mol using mineral salts medium and one-step anaerobic process [163]. Genetic mechanisms for high yield were then identified through genome sequencing and transcriptome and enzyme assay analysis. Pyruvate dehydrogenase (PDH) activity increased significantly, and sensitivity of PDH to NADH was eliminated by three mutations in LpdA, which is the E3 component of PDH [164166]. On the other hand, pentose phosphate pathway (PPP) and transhydrogenase SthA [167169] were activated. More carbon flux could go through the pentose phosphate pathway, thus leading to production of more reducing equivalent in the form of NADPH, which was then converted to NADH through soluble transhydrogenase for succinate production. Reverse metabolic engineering was further performed in the parent strain. Succinate yield increased from 1.12 to 1.5 mol/mol (88 % of theoretical maximum yield) by activating PDH, PPP, and SthA transhydrogenase in combination. It was suggested that the theoretical maximum succinate yield can also be obtained if 85.7 % of the carbon source goes through PPP, using both NADH and NADPH as the reducing equivalents [163]. The other benefit of using the PPP for succinate production is that only half exogenous CO2 is required, which could reduce the fermentation cost [163].

Fig. 4
figure 4

Genetic mechanisms for high-succinate production in E. coli strain HX024. Green square lines represent activated metabolic modules (including pentose phosphate, transhydrogenase, and pyruvate dehydrogenase) which are responsible for increased succinate yield. Pentose phosphate pathway was activated to increase the supply of NADPH, which was then converted to NADH through soluble transhydrogenase for succinate production. The sensitivity of pyruvate dehydrogenase to NADH inhibition was eliminated by the lpdA gene mutation. Pyruvate dehydrogenase activity increased under anaerobic conditions, which provided additional NADH for succinate production. Red oval lines represent activated metabolic modules (including glucose transport, carboxylation, reductive TCA, and succinate transport) which are responsible for increased succinate productivity. G6P glucose-6-phosphate, 6PGL gluconolactone-6-phosphate, 6PGC 6-phospho gluconate, Ru5P ribulose-5-phosphate, X5P xylurose-5-phosphate, R5P ribose-5-phosphate, S7P sedoheptulose-7-phosphate, E4P erythrose-4-phosphate, F6P fructose- 6-phosphate, FBP fructose-1,6-bisphosphate, GAP glyceraldehyde-3-phosphate, DHAP dihydroxyacetonephosphate, 1,3-BPG 1,3-bisphosphoglycerate, 3-PG 3-phosphoglycerate, 2-PG 2-phosphoglycerate, PEP phosphoenolpyruvate, ACP acetylphosphate, Ace acetate, DLAC D-lactate, FOR formate, ETH ethanol, OAA oxaloacetate, CIT citrate, ICIT isocitrate, GLO glyoxylate, MAL malate, FUM fumarate, SUC succinate, NAD + oxidized nicotinamide adeninedinucleotide, NADPH reducednicotinamide adenine dinucleotidephosphate, NADH reduced nicotinamide adeninedinucleotide, and NADP + oxidized nicotinamide adeninedinucleotide phosphate

3.4 Malate

Malate, together with fumarate and succinate, has been identified as one of the 12 most valuable bulk chemicals by the US Department of Energy [126]. It can be produced by several native microorganisms [170174]. Since converting one molecule pyruvate to one molecule malate only requires one NADH, the theoretical maximum yield for malate production can be 2 mol/mol glucose. Starting from a succinate-producing strain KJ073, the Ingram’s group developed an engineered E. coli strain for l-malate production [175]. Inactivating fumarase isoenzymes could not convert the succinate-producing strain to produce malate, and the resulting strain still accumulated large amounts of succinate. Fumarate appears to be the immediate precursor for succinate production in a fumarase-negative background. By contrast, it was surprisingly found that inactivation of fumarate reductase alone could reforce the carbon flow into malate production. It was suggested that the thermodynamic equilibrium favors the hydration of fumarate to malate and E. coli might have a better malate-transporting capability than fumarate. Inactivation of fumarase and malic enzymes further improve malate production. Strain XZ-T658 was obtained which produced 163 mM malate with a yield of 1.0 mol/mol glucose. When using a two-stage process, 253 mM malate was produced within 72 h and the yield reached 1.42 mol/mol [175].

3.5 Fumarate

Production of fumarate using fermentative process has been studied a century ago [176, 177], and the focus has been concentrated on the Rhizopus strains [176, 178188]. The best reported strain can produce 126 g/L fumarate with a yield of 0.97 g/g form glucose [189]. Recently, E. coli was also engineered by the Lee’s group for fumarate production under aerobic conditions [190]. The carbon flux was redirected through the glyoxylate shunt by deletion of the iclR gene, while the fumarate production was increased by the deletion of the fumA, fumB, and fumC genes. The engineered strain produced 1.45 g/L fumarate when glucose was used as the substrate. The ppc gene was then overexpressed, and the fumarate production increased to 4.09 g/L. To reach better performance, further genetic modifications were carried out including deletion of arcA (encoding ArcA transcriptional dual regulator) and ptsG genes to increase the oxidative TCA cycle flux, deletion of aspA (encoding aspartate ammonia-lyase) to decrease the degradation of fumarate, and replacement of the native promoter of galP by a strong trc promoter to promote the uptake of glucose. Strain CWF812 was obtained which produced 28.2 g/L fumarate with a yield of 0.389 g/g glucose when fermented in fed-batch for 63 h [190].

3.6 Glucaric Acid

d-Glucaric acid, a compound present in fruits, vegetables, and mammals, has been studied for therapeutic purpose [191193], and it has potential applications for polymers [126]. The synthetic pathway for d-glucaric acid production from glucose is present naturally in mammals. However, this natural pathway is composed of more than 10 reactions and limits its construction in E. coli. To realize the production of d-glucaric acid, the Prather’s group designed a synthetic pathway by coexpression of ino1 encoding myoinositol-1-phosphate synthase from S. cerevisiae, miox encoding myoinositol oxygenase from mice, and udh encoding the urinate dehydrogenase from Pseudomonas syringae in E. coli. The resulting strain produced more than 1 g/L of glucaric acid using LB medium with 10 g/L glucose [194]. MIOX was identified as the rate-limiting step in the whole pathway, and its activity was strongly affected by the myoinositol concentration. To improve the flux for glucaric acid production, two strategies were carried out. Utilization of protein scaffold to colocalize the three heterologous enzymes in a designable complex resulted in fivefold improvement of glucaric acid titer [195]. On the other hand, protein fusion tags and directed evolution were used to improve MIOX activity, leading to the production of 4.85 g/L glucaric acid from 10.8 g/L myoinositol [196].

3.7 Muconic Acid

Muconic acid (MA) is an important unsaturated dicarboxylic acid and has great potential for the production of bioplastics [197199]. It can also be used as the precursor for the synthesis of important bulk chemicals, such as adipic acid, terephthalic acid, and trimellitic acid [197]. Biosynthesis of muconic acid in E. coli has been studied from 1994 by the Frost’s group [200]. They described an artificial pathway for muconic acid biosynthesis from glucose by combining the shikimic acid pathway, which is natively present in E. coli for aromatic amino acid synthesis, with three heterologous enzymes including 3-dehydroshikimate (DHS) dehydratase, protocatechuic acid (PCA) decarboxylase, and catechol 1,2-dioxygenase (CDO). Inactivation of shikimate dehydrogenase to reduce the DHS consumption and overexpression of transketolase, 3-deoxy-d-arabinoheptulosonate 7-phosphate (DAHP) synthase, and 3-dehydroquinate (DHQ) synthase to increase the availability of DHS were further performed to improve muconic acid production. The resulting strain produced 2.4 g/L muconic acid in a batch fermentation. Deregulation of the feedback inhibition of shikimic acid pathway and overexpression of the critical genes increased muconic acid titer to 38.6 g/L [201]. Optimization of the fermentation process using fed-batch conditions further improved the titer to 59.2 g/L [202].

A novel artificial pathway for MA production in E. coli was established by integration of the native tryptophan biosynthetic pathway with a heterologous anthranilate degradation pathway [199]. In this pathway, anthranilate which is an intermediate involved in the native tryptophan biosynthetic pathway was transformed into MA sequentially by anthranilate 1,2-dioxygenase (ADO) from Pseudomonas aeruginosa and catechol 1,2-dioxygenase (CDO) from P. putida. The MA production was optimized by screening several enzyme candidates and improving the native tryptophan biosynthetic pathway. The resulting strain produced 389 mg/L muconic acid using the modified M9 minimal medium with a mixture carbon sources of glycerol and glucose [199].

Another novel MA synthetic pathway was designed via extending shikimate pathway by introducing the hybrid of a salicylic acid (SA) biosynthetic pathway with its partial degradation pathway [198]. A well-developed phenylalanine-producing strain was first engineered to produce SA by heterologous expression of the isochorismate synthase and isochorismate pyruvate lysate, leading to production of 1.2 g/L of SA. The SA was then converted into MA by introducing salicylate 1-monoxygenase and catechol 1,2-dioxygenase. Optimization of the whole pathway resulted in the production of MA up to 1.5 g/L after 48-h fermentation in shake flasks [198].

3.8 Adipic Acid

As the most important dicarboxylic acid, it is estimated that the market volume of adipic acid is about 2.6 million tons per year in global and an increase of 3–3.5 % will be expected annually [203, 204]. The primary use of adipic acid is as precursor for the production of polyamide nylon-6,6 [200, 203205]. Traditionally, adipic acid is produced by chemical catalytic pathway in industrial large-scale processes using benzene, an important compound derived from non-renewable fossil resource, as the principal starting compound [200, 203]. To decrease the dependence on fossil feedstock, many efforts have been made in the past years to develop an alternative way to produce adipic acid from renewable biomass resources [11, 200, 201, 203, 205211].

Although cis,cis-muconic acid can be converted to adipic acid by chemical hydrogenation [200, 201], it is still designed to construct cell factories for producing adipic acid directly through glucose fermentation [203, 205, 206, 208, 210]. Recently, the Zhong group at Shanghai Jiao Tong University constructed an artificial adipic acid synthetic pathway in E. coli [205]. Acetyl-CoA and succinyl-CoA were condensed to produce the C6 backbone 3-oxoadipyl-CoA, which was then converted to adipic acid sequentially via 3-hydroxyadipyl-CoA, 2,3-dehydroadipyl-CoA, and adipyl-CoA. The six enzymatic steps were catalyzed respectively by the β-ketoadipyl-CoA thiolase (PaaJ) from E. coli, 3-hydroxybutyryl-CoA dehydrogenase (Hbd) and crotonase (Crt) from Clostridium acetobutylicum, trans-enoyl-CoA reductase (Ter) from Euglena gracilis, and phosphate butyryltransferase (Ptb) and butyryl kinase (Buk1) from C. acetobutylicum. The constructed strain AA1 produced 31 μg/L adipic acid when fermented in minimal R/2 medium supplemented with 10 g/L glucose aerobically at 30°C for 120 h. The adipic acid titer increased to 120 μg/L when replacing Ter with butyryl-CoA dehydrogenase (Bcd) from C. acetobutylicum, replacing Hbd with 3-hydroxyacyl-CoA reductase (PaaH1) from Ralstonia eutropha, and replacing Crt with the putative enoyl-CoA hydratase (ECH) from R. eutropha H16. Supplies of acetyl-CoA and succinyl-CoA precursors were then increased to further improve adipic acid production, resulting in strain AA7 which produced 639 μg/L adipic acid which was about 20-fold higher than that of the starting strain AA1 [205].

4 Alcohols

E. coli cell factories have been constructed for production of a variety of alcohols, such as 1,3-propanediol [19, 212], 1-propanol [213], 1,2-propanediol [214, 215], isopropanol [216], n-butanol [217], isobutanol [13], 1,4-butanediol [12], and higher-chain alcohols [13, 51, 218, 219]. Some reviews have been focused on the elucidation of the bio-based production of alcohols using E. coli cell factories [2, 3, 217, 220], which will not be described here. This chapter will focus on recently developed cell factories for the production of higher-chain alcohols and 1,4-butanediol.

4.1 Higher-Chain Alcohols

Higher-chain alcohols are attractive biofuel targets because they exhibit higher energy density, lower hygroscopicity, lower vapor pressure, and compatibility with present transportation devices [218]. However, these compounds are not synthesized economically by native organisms [50]. Two different synthetic pathways have been created for the production of these compounds.

By the introduction of the broad-substrate-range 2-keto acid decarboxylases (KDCs) and alcohol dehydrogenases (ADHs) genes, the amino acid synthetic pathways can be redirected to produce higher-chain alcohols from 2-keto acids in E. coli [221]. Isobutanol is a representative example. A synthetic pathway for isobutanol production from glucose had been created by the Liao’s group through combining branched-chain amino acid synthetic pathway and Ehrlich pathway with 2-keto-isovalerate serving as a precursor [13]. Overexpression of the valine biosynthetic pathway (ilvIHCD) and the alcohol producing pathway (kivD from Lactococcus lactis and adh2 from S. cerevisiae) resulted in production of 1.7 g/L isobutanol. Competitive fermentation pathways, including adhE, ldhA, frdAB, fnr, and pta, were further deleted to increase the pyruvate supply for the isobutanol production. The resulted strain produced 2.2 g/L isobutanol with a yield of 0.21 g/g glucose. Further improvement was carried out by replacing the native ilvIH by the alsS gene from B. subtilis. AlsS has higher affinity for pyruvate than IlvIH, and the replacement increased isobutanol titer up to 3.7 g/L. With a deletion of the pflB gene, the isobutanol titer increased to 22 g/L under microaerobic conditions [13]. When fermented in a 1-L bioreactor instead of the shake flask with in situ isobutanol removal using gas stripping, the isobutanol production could reach a concentration of more than 50 g/L in 72 h [221].

NADPH is the reducing equivalent required for the production of isobutanol. Both keto acid reductoisomerase and alcohol dehydrogenase are NADPH dependent, and two equivalents of NADPH are required for the conversion of pyruvate to isobutanol. In contrast, the common reducing equivalent under anaerobic condition is NADH, which is produced through glycolysis [222]. In order to solve this cofactor imbalance problem, the cofactor specificity of keto acid reductoisomerase and alcohol dehydrogenase enzymes was changed from NADPH to NADH, and theoretical yield was obtained under anaerobic condition [222]. On the other hand, membrane-bound transhydrogenase PntAB and NAD kinase were activated in combination to increase the NADPH supply for improved isobutanol production [35]. Activating these two enzymes increased anaerobic isobutanol yield by 39 % to 0.92 mol/mol glucose [35].

Enzymes involved in the native L-leucine biosynthesis pathway were designed to catalyze the chain elongation, and various 2-keto acids were obtained. Diverse LeuA mutants were generated to suit the different size of the substrates [13, 16, 51, 217, 218]. These 2-keto acids were then successively converted to aldehyde and alcohols in turn by introducing KDCs and ADHs. By this pathway design, one carbon atom was added to the chain in each cycle and a new pool of alcohols can be produced.

A second pathway designed for higher-chain alcohol production was based on acetyl-CoA. Two carbon atoms were added to the chain in each cycle [217, 223225]. Five reactions were involved in the carbon chain elongation reactions and the production of alcohols, which were catalyzed by the thiolase (AtoB/BktB), dehydrogenase (Hbd/PaaH1), dehydrase (Crt), reductase (Ter), and thioesterase (TesB), respectively. These enzymes were not specific to all compounds with different carbon number. It is critical to find more specific enzymes to increase the efficiency for certain products.

4.2 1,4-Butanediol

As one of the important C4 platform chemicals, 1,4-butanediol (BDO) owns a world market exceeding 1 million tons and is used widely in the manufacture of biopolymers, cosmetics, fine chemicals, and solvents [3, 19]. BDO is predominantly produced from crude oil and natural gas. No biosynthetic pathways have been reported in any natural organisms. The project of constructing an E. coli cell factory for BDO production was initiated by Genomatica [12]. All candidate pathways from E. coli central metabolites to BDO were elucidated based on the SimPheny Biopathway Predictor software. Rather than by known enzyme reactions, the transformation of functional groups by known chemistry was used as the basis for the Biopathway Predictor algorithm, and this gave a chance to identify novel enzyme activities or to engineer enzymes with specific activities to a particular substrate. As a result, 10,000 pathways for the BDO synthesis from common central metabolites were identified. Then, the proposed pathways were evaluated based on the different factors including maximum theoretical yield, pathway length, number of non-native steps, and thermodynamic feasibility. At last, two pathways for BDO biosynthesis, which involved 4-hydroxybutyrate as the intermediate, were proposed as the highest priority and tested in vivo. A pathway was proved to be potential for BDO production in which BDO was produced from succinate via six enzymatic reactions catalyzed by two E. coli native enzymes (succinyl-CoA synthetase and alcohol dehydrogenase) and four heterologous enzymes (CoA-dependent succinate semialdehyde dehydrogenase, 4-hydroxybutyrate dehydrogenase, 4-hydroxybutyryl-CoA transferase, and 4-hydroxybutyryl-CoA reductase). The engineered strain produced over 18 g/L BDO from glucose in 5 days [12]. It could also produce BDO from sucrose, xylose, and biomass-derived mixed sugar streams.

To increase the efficiency of the BDO biosynthetic pathway, butyraldehyde dehydrogenase (Bld) and butanol dehydrogenase from Clostridium saccharoperbutylacetonicum were selected and used for BDO production in E. coli. Furthermore, random mutagenesis and site-directed mutagenesis were carried out in turn to improve the activity of Bld. The resulted strain could produce BDO with the titer fourfold greater than those of strains expressing the wild-type Bld [37].

5 Perspectives

Production of bulk chemicals by E. coli cell factories from renewable biomass resources has been proved to be a sustainable and environment-friendly process to replace the petroleum-based process. Due to the rapid development of metabolic engineering, systems biology, and synthetic biology, great progress has been achieved and many successful E. coli cell factories have been constructed. However, there are still several challenges.

Only a few bulk chemicals can be produced biologically by microorganisms. Although several new synthetic pathways have been designed and created, most of the bulk chemicals still do not have biosynthetic pathways. The primary reason is that many chemical reactions do not have natural enzymes. Creation of new enzymes to catalyze the desired chemical reaction by integrating chemistry, protein rational design, and directed evolution will be in demand to full in the gaps existing in the novel biosynthetic pathway [16].

The created synthetic pathway is usually not efficient due to the low catalytic activities of some specific enzymes, especially those new enzymes. It is thus very important to improve the catalytic capabilities of these enzymes so that they are not rate limiting within the whole pathway. In addition, coordinated expression of multiple genes involved in the synthetic pathway is desirable so that there will be no metabolic imbalance problem such as exhaustion of precursor and accumulation of toxic intermediates [226]. Product yield is an important factor to realize low-cost production of bulk chemicals. On the one hand, redox balance is necessary to maintain anaerobic cell growth since the only way to consume the reducing equivalent is through the synthetic pathway of target compound. On the other hand, enough reducing equivalent is required to reach the theoretical maximum yield.

Finally, good physiological characteristics of cell factories are necessary for large-scale industrial production of bulk chemicals [227]. Tolerance to high osmolality and high concentration of target chemicals can increase the final titer and productivity. Tolerance to high temperature can reduce the energy cost and contamination problems. Tolerance to low pH can produce organic acid directly so that complex downstream purification process can be avoided to convert organic acid salt to organic acid. Since modifying single gene usually has no effect on improving physiological characteristics, global perturbation strategies together with high-throughput omics analysis are needed to improve these physiological characteristics and identify the genetic mechanisms so that bulk chemicals can be produced biologically in a cost-comparable way compared to petrochemical process.