Keywords

1 Introduction

Advancements in metabolic engineering and synthetic biology have enabled accelerated engineering of microbial factories for the production of valuable chemicals (Smolke and Tyo 2012; Lee and Kim 2015; Isaacs et al. 2011), realizing the promise of a more sustainable (bio)economy (Voigt 2020). To keep pace with these expectations, pathway prediction and design play a crucial role in finding novel pathways for various applications like drug discovery (Galanie et al. 2015; Moura et al. 2016; Hafner et al. 2021) and value-added biochemical production (Yim et al. 2011; Tokic et al. 2018; Henry et al. 2010a, b). In this scenario, metabolic workhorses like yeast could be greatly benefited by broadening their product spectrum and improving their metabolic capabilities and performance in terms of their yields, titers, and productivities (Nielsen and Keasling 2016; Ko et al. 2020). For this task, progress in computational tools and methods capable of guiding experimental efforts is crucial for the optimization of cellular metabolism and incorporation of synthetic designs for the production of unnatural heterologous compounds.

In this chapter, an overview of the most relevant retro-biosynthesis and pathway optimization methods is provided with a focus on tools with direct application in metabolic engineering tasks. Starting with the reconstruction of a comprehensive reaction network from public databases and resources, both retro-biosynthesis tools for de novo pathway prediction and stoichiometry-based pathway optimization methods for metabolic redesign are described. Particularly in the latter case, convenient engineering objectives taking into consideration product yield, cofactor use, thermodynamic plausibility, and enzyme cost are discussed. Additionally, several relevant pathway engineering case studies in yeast are also presented, highlighting the improvement potential from the implementation of rational pathway designs. Finally, perspectives on the increasing adoption of these tools for metabolic engineering as well as limitations reducing their effectiveness are discussed.

2 In Silico Pathway Prediction and Design

Retro-biosynthesis and stoichiometry-based optimization methods have been established for pathway design and prediction, differing mostly in their scope and methodology. While both tools generate metabolic pathways producing the target metabolite, they do so by applying fundamentally different reaction network representations (i.e., graph or stoichiometric matrix) and search algorithms (i.e., optimization-based enumeration or retro-synthetic search) (the reader is referred to Wang et al. (2017) for a comprehensive review). Furthermore, their computational complexity and efficacy can vary significantly depending on the product of interest, and thus, careful selection of the appropriate tool for the case at hand is a must (Saa et al. 2019). In the following, the most relevant data- and knowledge-bases for reconstructing and parameterizing reaction networks are presented, which constitute the starting point for the application of any of these tools (Fig. 1a). Then, the most relevant retro-biosynthesis (Fig. 1b) and stoichiometry-based optimization methods for pathway prediction and design (Fig. 1c) are described.

Fig. 1
figure 1

Workflow for reaction network reconstruction and application of metabolic pathway prediction and design tools. a Assembly of accumulated metabolic reaction data into a comprehensive reaction network is a requirement for the application of the reviewed tools. Depending on the application objective, different network representations are employed for either predicting de novo pathways (i.e., retrosynthesis typically using a graph representation), or (re)designing pathways for higher metabolic performance (i.e., optimization-based pathway design using an stoichiometric representation). b Retro-biosynthetic tools explore a substrate graph seeking to connect the target molecule with some predefined precursors. Starting with the target molecule and moving backwards, these tools can generate several possible pathways that are typically ranked using different criteria (e.g., length, enzyme availability, thermodynamics, among others). c Stoichiometry-based pathway prediction methods employ a reaction network with known and fixed reactions to enumerate mass-balanced pathways that optimize a desired objective such as product yield, pathway length, thermodynamic favorability, and enzyme cost. In this case, different types of constraints can be defined to restrict the feasible solution space and narrow the search upfront

2.1 Data- and Knowledge-Bases for Metabolic Reaction Network Reconstruction

Databases for pathway search are an absolute requirement for exploring the feasible reaction space, as they contain the critical information of how metabolites are connected to others through biochemical reactions. There are numerous public data- and knowledge-bases populated with metabolic reaction data. Among the most popular, KEGG (Kanehisa et al. 2016), MetaCyc (Caspi et al. 2016), BIGG (King et al. 2016), KBase (Arkin et al. 2018), ModelSEED (Henry et al. 2010a, b), MetRxn (Kumar et al. 2012), and MetaNetX (Ganter et al. 2013; Moretti et al. 2021) stand out to name a few (for a more details refer to Wang et al. (2017)). Some of these databases (KEGG, MetaCyc, and Kbase) integrate multiple sources of biological information, e.g., genetic, molecular, physicochemical, and experimental, which makes them not only useful for metabolic pathway prediction purposes but also data integration (Lewis et al. 2012). The rest of the databases are mostly devoted to metabolic network reconstruction, offering either highly curated reconstructions for specific organisms (e.g., BIGG) or broader albeit possibly less curated biochemical reaction networks (e.g., MetRxn, ModelSEED, and MetaNetX). Ultimately, the modeling purpose will dictate the most convenient source of information considering their specific scope, breadth, and information quality. Complementary databases like BRENDA (Jeske et al. 2019) (kinetic information) and eQuilibrator (Flamholz et al. 2012) (thermodynamic information) also constitute valuable resources for parameterizing different optimization formulations.

The aforementioned databases contain information for known reactions, which may restrict the pathway search considering the current enzymatic knowledge gaps. Resources like the ATLAS of Biochemistry (Hadadi et al. 2016) (derived from the BNICE tool (Hatzimanikatis et al. 2005)) and MINE (Jeffryes et al. 2015) offer larger networks including hypothetical reactions and metabolites that can expand the reachable chemical space and allow higher complexity. Briefly, these resources exploit user-defined reaction rules that can act on chemically similar compounds, thereby yielding new hypothetical reactions. The latter reactions have recently been shown to enable filling some of the gaps in current enzyme-reaction associations (Hadadi et al. 2019). Lastly, another significant and recent tool for proposing hypothetical reactions that has been employed for pathway prediction is rePrime used by the novoStoic tool (Kumar et al. 2018) (see subsection 2.3 for more details). The former method extracts reaction rules from molecular signatures found in annotated reactions—defined by the presence of a set of chemical ‘moieties’—for proposing hypothetical enzymatic transformations with a high structural encoding fidelity. Unfortunately, this tool currently lacks an associated open database for its use.

2.2 Pathway Prediction Using Retro-Biosynthesis Tools

Firstly, a distinction is made between retro-biosynthesis and classical retro-synthesis, as the latter is focused on the design of chemical reaction pathways, typically without relying on enzyme catalysis (Lin et al. 2019). Retro-biosynthesis tools seek to identify de novo biosynthetic pathways for the production of valuable compounds from inexpensive precursors using known and hypothetical enzyme activities (Wang et al. 2017; Lin et al. 2019). Another—though less explored—application of these tools involves the opposite, that is, the prediction of novel enzymatic routes for the degradation of recalcitrant compounds, e.g., for bioremediation purposes (Finley et al. 2009, 2010; Ellis et al. 2006). For pathway prediction, retro-biosynthesis tools explore the full chemical space for synthetic pathways toward the target compound. For this task, these tools typically represent the network as a (substrate) graph that can be readily traversed using known enumeration algorithms. Graph traversal is possible by connecting the substrates using various criteria based on structural (chemical) similarity, reaction promiscuity, and defined reaction rules. In the following, a relevant subset of retro-biosynthesis tools employed for metabolic engineering/synthetic biology applications is described (Table 1).

Table 1 Retro-biosynthesis tools for metabolic pathway prediction

One of the most established tools for de novo pathway retro-biosynthesis is BNICE (Hatzimanikatis et al. 2005). This framework employs predefined ‘generalized enzymatic reaction rules’ (encoded in a bond-electron matrix) that are applied to precursor molecules on their reactive sites to yield new product molecules. BNICE uses a substrate graph representation of the chemical network, which can be traversed using graph search algorithms starting from the target compound and moving backwards until connecting with one of the defined precursors. Different pruning criteria are employed to keep the search breadth computationally tractable. At the end of the algorithm, pathways are ranked by features such as pathway length, thermodynamics, among others. Methodologically close to BNICE, PathPred (Moriya et al. 2010) uses instead RDM patterns consisting of reaction center atoms (R), atoms of different regions (D), and atoms of the matched region (M) for exploring the substrate graph. Pruning of the network is executed using structural similarity criteria, and pathway ranking is performed using compound similarity and pathway scores. SimPheny (Yim et al. 2011; Schilling et al. 2005) uses reactions rules from the third Enzyme Commission (EC) number level for generating reaction rules that enable reaction promiscuity for broader explorations. In this case, a retro-synthetic search is employed for enumerating feasible routes that produced intermediates of reasonable size (i.e., below a predefined size), and later they are ranked based on various criteria. Another retro-biosynthesis tool with recent important applications is RetroPath (Delépine et al. 2018). This tool uses a retro-synthetic search, albeit combined with MILP formulations for the application of various ranking criteria, such as thermodynamics, gene prediction, pathway length, number of putative steps, and product yield. In contrast to BNICE, RetroPath maintains a stoichiometric representation of the network (as opposed to a substrate graph) that enables computation of various scores. Moreover, molecular signatures are used to generate reaction rules based on a substructure of adjacent atoms, enabling the generation of substantially more and flexible reaction rules (Duigou et al. 2018). A recent implementation of reinforcement learning in RetroPath (RetroPath RL) has yielded promising results in the retro-biosynthetic prediction of biologically relevant pathways (Koch et al. 2020).

2.3 Stoichiometry-Based Optimization Methods for Pathway (Re)design

Given a universal metabolic reaction network, the ‘pathway design’ problem seeks to identify ‘optimal’ route(s) for the production of the target compound. As opposed to the retro-biosynthesis problem, possible connecting reactions are fixed and known upfront. Construction of the reaction network knowledge base is achieved by combining metabolic data from curated databases and/or from databases that also include putative reactions derived, for example, from generalized reaction rules (Hadadi et al. 2016; Hatzimanikatis et al. 2005; Jeffryes et al. 2015) or molecular signatures (Kumar et al. 2018). Regardless of the source of the data, reaction data must be charge- and mass-balanced to yield correct results, which typically is ensured in a manual curation step. Metabolite/reaction name inconsistencies are also an important source of issues that affect network connectivity and consistency, which often have to be resolved manually. While there have been attempts to standardize reaction and metabolite identifiers (King et al. 2016; Kumar et al. 2012; Alcántara et al. 2012), name reconciliation is challenging due to the incessant annotation of new metabolites and enzymatic activities, albeit important progress has been made in recent database versions (Moretti et al. 2021).

Once the reaction network has been assembled and mathematically formulated into a stoichiometric matrix, prediction of different pathway designs can be readily computed using optimization-based methods provided a convenient objective function. Among the most relevant objectives, one can name the minimization of the pathway length ensuring a minimum product yield (e.g., by fixing the overall stoichiometry) (Pharkya et al. 2004), maximization of the product yield observing thermodynamic constraints (Kumar et al. 2018; Kamp and Klamt 2020; Chowdhury and Maranas 2015), maximization of the thermodynamic favorability of the pathway (Flamholz et al. 2013; Noor et al. 2014; Hädicke et al. 2018; Yang et al. 2020; Ng et al. 2019), and minimization of the pathway´s enzymatic cost (Flamholz et al. 2013; Ng et al. 2019; Court et al. 2015; Bar-Even et al. 2010). For each of these objectives, different optimization problems must be formulated and solved, often requiring various parameters (e.g., thermodynamic and kinetic) from other sources for computing the optimal solution(s). In the following, the most relevant stoichiometry-based optimization methods for metabolic pathway prediction are presented. Further details about the methods and applications can be found in Table 2.

Table 2 Optimization-based methods for metabolic pathway design

2.3.1 Pathways with Desired Stoichiometric Properties

Constraint-based modeling methods (Edwards and Palsson 2000) can be readily adapted for the computation of pathways exhibiting a desired stoichiometry (i.e., yield) (Kumar et al. 2018; Chowdhury and Maranas 2015; Ng et al. 2019), shortest length (Chowdhury and Maranas 2015; Ng et al. 2019), convenient precursor use (Kamp and Klamt 2020), and if using an organism's reaction network as metabolic chassis, minimum addition of exogenous reactions (Pharkya et al. 2004; Kim et al. 2011). In practice, these pathway enumeration methods rely on the solution of various LP and/or MILP optimization problems that optimize some of the aforementioned objectives subject to not only stoichiometric constraints but possibly to thermodynamic and/or economic constraints.

OptStrain, SimOptStrain, and OptStoic are classical tools for pathway prediction, although they differ in their scope. While the first two seek to predict optimal pathways and metabolic interventions for the production of a target metabolite leveraging the microbial host reaction network (Pharkya et al. 2004; Kim et al. 2011), the latter aims to find complete mass-balanced conversion pathways that yield a desired stoichiometry from precursors to product(s) using metabolic databases as the input reaction network (Chowdhury and Maranas 2015). Additional constraints related to a minimum guaranteed product yield, thermodynamic plausibility of the pathway, and/or substrate costs can be readily included to obtain more convenient designs. Recently, a computational method called MEMO (Kamp and Klamt 2020) has been proposed for identifying the smallest metabolic modules with specified stoichiometric and thermodynamic properties. For instance, this approach has been employed to find small cofactor regeneration (e.g., ATP/ADP, NAD(P)H, NAD(P), among others) modules that can sustain bioconversions in the context of cell-free applications under defined thermodynamic conditions.

The aforementioned methods rely on existing annotated enzymatic reactions for metabolic conversions. However, as mentioned in the previous subsection, promiscuous enzymatic activities are characteristic features of metabolic reaction networks, likely playing an evolutionary role as a starting point in enzyme functions (Khersonsky and Tawfik 2010). Importantly, the existence of this feature suggests that there is still untapped potential for a broader chemical reaction space to be explored. By using various extraction techniques for learning putative reactions from known enzymatic reactions, it is possible to populate and assemble larger databases for pathway prediction. An example of these methods is MapMaker/PathTracer (Tervo and Reed 2016), which employs precomputed carbon transfer maps (CTMs) based on chemical and stoichiometric information (MapMaker) for the prediction of short, carbon-balanced pathways from substrates to products (PathTracer). GEM-Path (Campodonico et al. 2014) is another framework that, using a genome-scale metabolic reconstruction of E. coli as base reaction network, combines heterologous pathway integration (similar to OptStoic) with constraint-based growth-coupled methods for the computation of metabolic designs. Increased biochemical reaction exploration is achieved through the introduction of a chemical similarity measure to assess enzyme-catalyzed reaction promiscuity. Lastly, the novoStoic/rePrime framework (Kumar et al. 2018) enables exploration of a far greater chemical transformation space through the imposition of chemical ‘moiety’ conservation (refer to Sect. 2.1) that is particularly suited for the prediction of optimal pathways with maximum yield or length. Importantly, this mathematical treatment avoids chemical reaction information loss (e.g., stereoselectivity) as opposed to other approaches like MapMaker/PathTracer.

2.3.2 Pathways with Maximum Thermodynamic Favorability

Pathway thermodynamics exerts a fundamental control in metabolic flux with seemingly important consequences for microbial fitness (Du et al. 2018). While there have been different methods for combining stoichiometric-based analysis with reaction thermodynamics (Henry et al. 2007; Kummel et al. 2006), it has not been until recently that thermodynamic favorability has been mathematically formalized. For this task, the Max-min Driving Force (MDF) index (Noor et al. 2014) has been proposed for quantifying the smallest absolute Gibbs free energy (or driving force) of a given pathway under the most favorable metabolic conditions. As the latter captures the driving force of the most unfavorable conversion step (i.e., thermodynamic bottleneck), its maximization yields the most favorable operating conditions for a given pathway. More recently, the OptMDFpathway method (Hädicke et al. 2018) was introduced to identify the most thermodynamically favorable pathways in a given reaction network, thereby enabling exploration of thermodynamically plausible production pathways in the context of microbial metabolism (Hädicke et al. 2018; Yang et al. 2020), and more recently, in microbial communities (Bekiaris and Klamt 2021).

2.3.3 Pathways with Minimum Enzymatic Cost

Cellular metabolism incurs a metabolic cost when committing to the synthesis of a particular set of proteins (enzymes). As seemingly similar enzymes can still display large differences in their catalytic properties (Bar-Even et al. 2011), it is natural to seek pathways that can yield the maximum return of investment (flux) per protein (enzyme) mass synthesized. For this task, the Enzyme Cost Minimization (ECM) (Noor et al. 2016)—and later termed the Enzyme-Flux Cost Minimization (EFCM) (Wortel et al. 2018)—formulation computes the minimum enzyme load (i.e., the aggregated enzyme mass allocated) required for a metabolic pathway to yield a given flux (Flamholz et al. 2013; Bar-Even et al. 2010). While this formulation originally required a thermodynamically consistent, fully parameterized kinetic model for this calculation (Saa and Nielsen 2017), increasingly enzymatically-constraint GSMMs (Sánchez et al. 2017) and ME-models (metabolic and expression) (Lerman et al. 2012) are being considered and employed for these calculations under the optimistic scenario of enzymatic catalysis at capacity. Finally, the ECM/EFCM does not support performing pathway enumeration, although it can be readily employed as a ranking index when combined with the previous approaches.

3 Case Studies of Metabolic Pathway Prediction and Optimization in Yeast

In this section, selected case studies illustrate different pathway engineering aspects required for improving metabolic performance overall, and particularly, in yeast. These examples showcase strategies for redox cofactor balancing, increased precursor supply, and engineering of central pathways for carbon fixation. The impact of the latter applications is especially highlighted in the context of harnessing the metabolic potential of yeast for industrial bioproduction. Figure 2 illustrates the details of the revised strategies.

Fig. 2
figure 2

Illustration of selected reported strategies for achieving improved cofactor balancing, increased acetyl-CoA supply and engineering CO2 fixation in yeast. The details of each strategy are discussed in Sect. 3. Relevant metabolite names are represented by uppercase bold fonts, whereas enzyme names are represented by uppercase italics fonts. Abbreviations: 13DPG, 1,3-diphosphoglycerate; 3PG, 3-phosphoglycerate; 6PGL, 6-phospho-D-glucono-1,5-lactone; ACCOA, acetyl-CoA; ACE, acetate; ACETAL, acetaldehyde; AcP, acetyl phosphate; ACS, acetyl-CoA synthetase; AKG, alpha-ketoglutarate; ALD, aldehyde dehydrogenase; DHAP, dihydroxyacetone phosphate; ETOH, ethanol; F6P, D-fructose 6-phosphate; FDP, D-fructose 1,6-disphosphate; FOR, formate; G3P, glycerol 3-phospate; G6P, D-glucose 6-phosphate; G6PD, glucose 6-phosphate dehydrogenase; GAP, glyceraldehyde 3-phosphate; GAPDH, glyceraldehyde 3-phosphate dehydrogenase; GDH, glutamate dehydrogenase; GLC, D-glucose; GLU, glutamate; GLY, glycerol; HMGCOA, 3-hydroxy-3-methyl-glutaryl-CoA; HMGCOAR, HMG-CoA reductase; MEOH, methanol; MEV, mevalonate; NH4, ammonia; PDH, pyruvate dehydrogenase; PFL, pyruvate formate lyase; PK, phosphoketolase; PRK, phosphoribulokinase; PTA, phosphotransacetylase; PYR, pyruvate; R5P, D-ribose 5-phosphate; Ru15P, ribulose 1,5-disphosphate; Ru5P, D-ribulose 5-phosphate; STH, transhydrogenase; Xu5P, D-xylulose 5-phosphate; XYL, D-xylose

3.1 Balancing Redox Cofactor Supply for Improving Substrate Utilization and Isoprenoids Production

Regeneration of either redox and/or energy cofactors often limits the production of high-value metabolites. In order to increase the availability of the required cofactor(s), central carbon metabolism must be intervened and engineered in such a way that it favors bioproduction without extremely affecting microbial growth (Lee et al. 2013). This challenge is particularly relevant for many NAD(P)H-expensive valuable compounds that are being produced in yeast (Cataldo et al. 2020; López et al. 2020, 2019) and other microbes (Ko et al. 2020).

Increased supply of redox cofactors can be achieved by either overexpressing key enzymes involved in cofactor generation (Lee et al. 2007; San et al. 2002; Lim et al. 2002) or by increasing the expression of alternative redox partner systems. A recent application of the latter has proved effective for enhancing the unprecedented heterologous production of violaxanthin in S. cerevisiae by approx. two-fold (Cataldo et al. 2020). However, the success of these approaches is likely limited due to the presence of different intrinsic balancing mechanisms for maintaining homeostasis in yeast (Hou et al. 2010). An illustrative example of the latter can be found in the study of Nissen et al. (2001). Here, heterologous expression of the pyridine nucleotide transhydrogenase system (sth gene, absent in yeast) that transfers reducing equivalents from NADPH to NADH (and vice versa), did not improve ethanol formation in anaerobic conditions. On the contrary, ethanol production was reduced concomitantly with the increase of fermentation by-products (glycerol and 2-oxoglutarate) required for redox rebalancing. Another, less intuitive and possibly more effective, strategy for (re)balancing redox cofactors supply and demand involves cofactor swapping (Verho et al. 2003; Martínez et al. 2008). Computational studies in S. cerevisiae and E. coli support this strategy as a promising intervention for forcing higher metabolic performance (King and Feist 2014). Simply put, this approach seeks to replace native (redox-consuming) enzymes with heterologous counterparts with a different cofactor specificity (e.g., NAD(P)H—for a NA(D)H-dependent enzyme).

The first application of the latter strategy involved the optimization of D-xylose utilization for ethanol production in S. cerevisiae (Verho et al. 2003). This carbon source is assimilated through the pentose phosphate pathway (PPP) as D-xylulose-5-phosphate and then incorporated as glyceraldehyde 3-phosphate in glycolysis. In theory, D-xylose should produce CO2 and ethanol in a 1:1 molar ratio under redox-neutral anaerobic conditions (Kötter and Ciriacy 1993). However, D-xylose assimilation requires extra NADPH and NAD+ that must be regenerated by other separate processes, which are very inefficient in yeast, rendering D-xylose fermentation slow. To overcome this bottleneck and force higher NADPH supply and flux through lower glycolysis, the native NAD-dependent GAP dehydrogenase (GAPDH) was replaced by an NADP-dependent GAPDH and the NADPH-dependent glucose-6-phosphate dehydrogenase (G6PD) was knocked out, which also prevented carbon loss as CO2 (Verho et al. 2003). This strategy almost doubled the ethanol yield on D-xylose (from 18 to 41%) and reduced the CO2/ethanol molar ratio close to the theoretical 1:1 (from 2.5 to 1.3). Later, expression of the heterologous phosphotransacetylase (PTA) and phosphoketolase (PK) for improving NADH reoxidation in the D-xylose utilization pathway generated an increase in ethanol yield (25% higher) without affecting the growth rate (Sonderegger et al. 2004).

Cofactor rebalancing and swapping strategies for the synthesis of NADPH-expensive isoprenoid-derived compounds have shown to be particularly effective in yeast. For instance, α-santalene production yields a net production of NADH and consumption of NADPH, which calls for the rebalancing of the cofactor supply (Scalcinati et al. 2012). By deleting known reactions involved in glutamate metabolism (ammonium assimilation) that consume NADPH (GDH1) and activating NAD-dependent counterparts (GDH2) (Nissen et al. 2000), the production of α-santalene was substantially improved. Similarly in a different study of protopanaxadiol production—another isoprenoid-derived compound—the availability of NADPH was enhanced by replacing the native NADH-generating acetaldehyde dehydrogenase (ALD2) with a functionally equivalent NADPH-generating enzyme (ALD6), resulting in a 11-fold increase in titer (Kim et al. 2018). Lastly, swapping of the native NADP-dependent 3-hydroxy-3-methyl-glutaryl-CoA reductase (HMG-CoA reductase)—third committed step of the mevalonate pathway responsible for the production of isoprenoid precursors—has also shown to increase the overall pathway flux in E. coli (Ma et al. 2011). This result was leveraged by Meadows et al. (2016) whereby an NADH-consuming HMG-CoA reductase from Silicibacter pomeroyi was employed for the overproduction of the sesquiterpene farnesene. Implementation of other computationally predicted major metabolic cofactor swaps like the alcohol dehydrogenase (ALCD2) and GAPD for the improved production of isoprenoids remains to be tested experimentally (King et al. 2016), as they could be potentially beneficial for boosting production as shown in other microorganisms (Martínez et al. 2008).

3.2 Increasing Cytosolic Acetyl-CoA Availability for Metabolic Production

Cytosolic acetyl-CoA is a key metabolite for the production of a range of valuable compounds in yeast (Rossum et al. 2016). Native production of this compound requires 2 mol of ATP and yields 2 mol of acetyl-CoA and 4 mol of NAD(P)H per mol of glucose (Rossum et al. 2016). To improve the availability of this precursor and lower the ATP cost, different heterologous enzymes have been introduced to either bypass the native aldehyde dehydrogenase (ALD) and acetyl-CoA synthetase (ACS) system using bacterial counterparts, i.e., A-ALD and PFL, that do not incur such high cost (Kozak et al. 2014a, b), or to enable acetyl-CoA biosynthesis in situ by expressing the pyruvate dehydrogenase (PDH) complex in the cytoplasm (Kozak et al. 2014a, b). While the former application showed mixed results in terms of growth and yield (mainly due to by-product accumulation), the second approach along with a knock-out of the native ACS reaction exhibited similar metabolic performance to the control, but at a lower ATP cost.

A different approach for improving acetyl-CoA availability relies on increasing its yield. For this task, the phosphoketolase pathway (PKP) was early suggested for the conversion of 1 mol of F6P into 3 mol of acetyl-P without carbon loss (Schramm and Racker 1957). Conversion of acetyl-P to acetyl-CoA can be later achieved by the reversible phosphotransacetylase (PTA) reaction (Rossum et al. 2016). This was initially implemented in yeast for improving D-xylose fermentation (Sonderegger et al. 2004) (refer to previous section). More recently, Bogorad et al. (2013) implemented the full PKP in E. coli and demonstrated almost stoichiometric conversion of C5 and C6 sugars into acetate under anaerobic conditions. A similar approach was replicated in yeast accompanied by several genetic interventions to increase acetyl-CoA-derived farnesene (Meadows et al. 2016). This non-native pathway increased carbon utilization by 25%, decreased oxygen consumption by 75%, and reached 15% v/v of farnesene. As illustrated here, increasing acetyl-CoA availability may be critical not only for maximizing production but also for overall improving metabolic performance.

3.3 Engineering a Heterologous CBB Cycle for CO2 Fixation

There is a growing interest in the field for engineering carbon assimilation pathways in heterotrophs for improving product yields—e.g., by reducing carbon loss as CO2 –, and most notably, for implementing one-carbon (C1) compounds (e.g., CO2) fixation pathways to develop more sustainable fermentation bioprocesses.

In an early effort from Guadalupe-Medina et al. (2013), a heterologous Calvin–Benson–Bassham (CBB) cycle was implemented in S. cerevisiae seeking to improve ethanol yield by reducing carbon loss under anaerobic conditions. The authors noted that by expressing the CBB enzymes phosphoribulokinase (PRK) and ribulose-1,5-bisphosphate carboxylase-oxygenase (RuBisCO), a working pathway could be realized where CO2 is effectively used as an electron acceptor for NADH oxidation, thereby coupling CO2 fixation by RuBisCO with the fermentation redox balance. Importantly, this mechanism rendered NADH reoxidation through the native glycerol formation pathway unnecessary (90% reduction in glycerol titer), increasing ethanol yield by 14% (Guadalupe-Medina et al. 2013).

While the latter strategy was successful in increasing product yield, it did so by reducing carbon loss as glycerol and not by significantly increasing CO2 assimilation (Guadalupe-Medina et al. 2013). A more radical approach is to engineer a CO2 assimilation pathway capable of sustaining growth and production. In a pioneer work from Antonovsky et al. (2016), E. coli was transformed and evolved to grow solely on CO2 as a carbon source and pyruvate as an electron source. Again, expression of the missing CBB enzymes PRK and RuBisCO, and knock-out of the phosphoglycerate mutase (PGM)—revealed by Flux Balance Analysis (Lewis et al. 2012)—forced CBB operation by decoupling carbon fixation from energy production. This metabolic phenotype was termed hemi-autotrophic growth, and it has since been implemented in other bacteria like the methanol-consuming bacteria Methylobacterium extorquens AM1 through the expression of the previous CBB enzymes and deletion of essential genes for methanol assimilation (Borzyskowski et al. 2018). Building on these strategies, a recent study reported the conversion of the yeast P. pastoris into an autotroph that grows on CO2 as the sole carbon source and methanol as an energy source (Gassler et al. 2020). Briefly, P. pastoris can use methanol as both energy and carbon sources. By decoupling the formaldehyde—the assimilated product of methanol oxidation—dissimilatory (carbon-fixating) and assimilatory (energy-producing) pathway branches, one can force CO2 assimilation by blocking the dissimilatory branch through the deletion of the dihydroxyacetone synthase (DAS1 and DAS2) and alcohol oxidase 1 (AOX1). Then, complementation of the native peroxisomal xylose monophosphate (XuMP) cycle with six enzymatic steps enables operation of the CBB cycle allowing growth on CO2. In stoichiometric terms, 1 mol of oxidized methanol produces 2 mol of NADH, which can be used to fuel the CBB cycle though not in stoichiometric proportions with CO2 (3 mol of ATP and 2 mol of NADH are needed to fix 1 mol of CO2). The resulting mutant strain reached a maximum specific growth rate of 0.018 h−1 (Gassler et al. 2020) and constitutes an unprecedented advance for compartmentalized C1 carbon fixation in yeast differing from seemingly similar efforts in bacteria (Antonovsky et al. 2016; Bang and Lee 2018).

4 Challenges and Outlook

During the past decade, yeast metabolic engineering has shown great progress and promise (Smolke and Tyo 2012; Nielsen and Keasling 2016), quickly becoming one of the preferred microbial factories for realizing the bioproduction of new chemicals or improving the production of traditional ones. This success has been largely driven by the continuous advances in the development of genetic and molecular tools (Smolke and Tyo 2012), as well as novel computational frameworks for pathway discovery and optimization (Wang et al. 2017; Saa et al. 2019). The latter has brought not only new possibilities for the evaluation of novel biochemical synthesis routes but also has provided more rational methods for designing metabolic pathways with superior performance by rewiring metabolism at a whole-cell scale (Saa et al. 2019). In time, such capabilities will become increasingly essential for arriving at designs that scale industrially and meet commercial expectations.

Pathway discovery is supported by the use of retro-synthesis tools that generate putative routes connecting substrates to products. A comprehensive exploration of the chemical space typically rests on the availability of reaction rules, which fills the gaps between the metabolic precursors and target chemical(s). Generation and application of such rules must be carefully performed, as they may provide infeasible pathways that may obscure results interpretation (Wang et al. 2017). Atom mapping information can be of great aid for validating the application and generation of certain reaction rules, see e.g., RouteSearch (Latendresse et al. 2014) and ReTrace (Pitkänen et al. 2009), which can be further completed with enzyme promiscuity knowledge if available (Mazurenko et al. 2020). Another incipient alternative for learning novel chemical reaction routes rests on machine learning techniques (Koch et al. 2020), which can potentially increase exponentially the size of the reachable chemical space as shown elsewhere (Coley et al. 2019; Mikulak-Klucznik et al. 2020). Efficient navigation of such vast space would necessarily have to rely on the introduction of pathway scores and rankings to focus the attention on the most promising and realizable designs. For this task, evaluation of the objectives reviewed here along with others—e.g., use of enzymes with known promiscuous activity or cofactor specificity—constitutes a natural way for prioritizing and selecting desired pathways. Rational integration of the various objectives can be achieved by leveraging mature multi-decision multi-criteria techniques (Bonissone et al. 2009), which remains largely unexplored in the field. Notably, the latter techniques are also transferable to optimization-based methods for pathway prediction, which could enable a more holistic evaluation of pathway performance and robustness.

While the revised computational methods and tools for pathway prediction have provided unintuitive and useful insights, their experimental application and validation remain still limited. Although there have been recent applications in yeast (Hafner et al. 2021) and other model organisms (Yim et al. 2011) where some of the tools have proven to be critical for finding effective in vivo metabolic designs, there is still resistance to their broad adoption. Indeed, in vivo implementation of complex in silico metabolic designs is not trivial, typically demanding great amounts of experimentation time before arriving at a working pathway (Antonovsky et al. 2016; Schwander et al. 2016; Savile et al. 2010). Such efforts could gain from recent computational frameworks for kinetic model construction (Saa and Nielsen 2017) that could help to predict a priori the expected performance of the pathway (see, for example, (Theisen et al. 2016)), greatly reducing the time and resources needed. As the metabolic prediction capabilities of current models increase (Foster et al. 2021), it is expected that the use of these tools for rational pathway engineering in yeast and other microbial factories will progressively become part of the basic toolbox for metabolic engineering.