Keywords

5.1 Introduction to Genome-Scale Metabolic Models

Since the early distinction of genotypes from phenotypes (Johannsen 1911) life science research has sought for understanding their dependency. The dependency is inherently complex and dynamic. Single genotype may manifest several phenotypes (i.e., clonal heterogeneity) and different genotypes may translate to indistinguishable observable phenotypes. While the complete genotype–phenotype dependencies are challenging to resolve, metabolic phenotypes are moving within the reach through genome-scale metabolic model simulations. A genome-scale metabolic model is a description of the complete biochemical conversion potential encoded in an organism’s genome as a network of reactions (Fig. 5.1). The stoichiometries of these reactions form mass conservation constraints of cellular metabolism. When a biological optimality principle (e.g., fast cell growth) is additionally introduced, a steady-state metabolic phenotype can be simulated using powerful linear programming solvers. Such simulations holistically consider cellular resource, energy, and redox requirements for biochemical synthesis. A myriad of applications has been derived from the original undecorated phenotype simulation. The applications vary from simulating metabolic genotype–phenotype dependencies for finding cancer drug targets to designing genotype manipulations for achieving desired phenotypes in microbial hosts for industrial biotechnology needs.

Fig. 5.1
figure 1

Metabolic capacity of cells represented as a network of reactions or further as stoichiometric matrix allows simulations of metabolic phenotypes using linear programming. Metabolic steady-state assumption renders the system of metabolite mass balances linear. Reaction capacity and thermodynamic constraints can be included and limit the space of feasible metabolic phenotypes (i.e., metabolic fluxes)

Yeasts, unicellular eukaryotes, are suitable hosts for industrial biotechnology owing to their robustness against harsh growth environments, established genetic engineering tools for several species, and eukaryotic protein modification. They have scientific relevance also as simpler model system for higher cells and some yeasts are pathogenic causing difficult infections. Furthermore, yeasts, Saccharomyces cerevisiae, in particular, have been domesticated for food and beverage fermentations and baking already since ancient times. While S. cerevisiae is by far the most well studied and broadly used yeast in applications, several other species attract considerable interest as well. For instance, Pichia pastoris is a widely used protein production host, Kluyveromyces lactis is known for beta-galactosidase synthesis, Yarrowia lipolytica is an oleaginous yeast attractive for lipid production, Scheffersomyces stipitis is a naturally xylose-utilizing yeast, and pathogenic yeasts Candida tropicalis and Candida glabrata cause difficult infections urging for more efficient treatments to be developed. The variety of yeast species of scientific and application interest can be expected to broaden following the rise of CRISPR/Cas9 and other generally applicable genetic engineering tools such as synthetic expression system universal for fungi (Rantasalo et al. 2018). Genome sequences are already available for a large variety of yeasts. Reference genomes for 98 yeast species are available from NCBI (www.ncbi.nlm.nih.gov/genome).

5.1.1 Genome-Scale Metabolic Model Reconstruction

Genome sequence is the starting point for reconstructing a genome-scale metabolic model. Semi-automatic reconstruction methods are available for building the first drafts of genome-scale metabolic models from the genome sequences (Swainston et al. 2011; Agren et al. 2013; Pitkänen et al. 2014; Castillo et al. 2016; Dias et al. 2015). The quality of draft reconstructions after the semi-automatic processes is strongly dependent on the comprehensiveness and quality of the source reaction database used. The reaction database has to contain links from the reactions to corresponding gene/protein sequences either within the database or by proving adequate identifiers such as EC numbers for external mapping. Reactions need to essentially be atom balanced for mass conservation in the reconstructed model. Popular reaction databases for genome-scale metabolic model reconstruction include Kegg (Kanehisa et al. 2017), Rhea (Morgat et al. 2017), MetaCyc (Caspi et al. 2014), BiGG (Schellenberger et al. 2010), and Reactome (Fabregat et al. 2018). A confidence score for the presence of a reaction from the reaction database in the metabolic repertoire of the species is derived by most of the semi-automatic reconstruction methods. Then, the high scoring reactions are pulled to the model after which gap filling algorithms are used for introducing lower scoring reactions that are essential for the in silico synthesis of biomass. Gap filling benefits greatly from experimental data on the growth of the species under different nutrient environments (Tramontano et al. 2018). Alternatively, to the two-phase process of introducing high scoring reactions followed by gap filling for a functional model, a single step process of carving out the organism-specific metabolic network from a universal gapless model (CarveMe) has recently been proposed (Machado et al. 2018). When the universal model is well curated, simulatable species-specific models are fast to reconstruct using CarveMe (Machado et al. 2018). Further, using a universal model standardizes the quality of input reaction data for reconstructing different species models. However, there are also other sources of uncertainty in the model reconstruction such as the quality of the genome and the annotations, and the availability of similar annotated sequences in databases. Given the data, several models of a species could score equally well in the automatic reconstruction. Therefore, an approach has been suggested for simulating an ensemble of equally likely models simultaneously instead of a single reconstruction (Biggs and Papin 2017). Yet, evaluating the quality of models reconstructed for less well-studied non-model species is challenging. The reconstruction algorithms themselves can be evaluated against manually curated models and experimental data on model organisms such as metabolic gene knockout phenotypes. Metabolic gene knockout phenotypes can be simulated using the gene annotations of the models. The genes are annotated to the reactions whose catalyzing enzymes they encode. Preferably, the gene annotations include also Boolean rules describing whether the genes annotated to the reaction encode isoenzymes (i.e., OR rule) or whether they form a complex whose all components are required for activity (i.e., AND rule). Thereby, the Boolean rules allow propagating the genetic state into reaction activity state for performing mutant phenotype simulations. Simulated mutant phenotypes can be compared against experimental deletion mutant phenotypes for validating models. Though many metrics have been proposed for assessing the quality of reconstructed models (Sanchez and Nielsen 2015; Lopes and Rocha 2017), experimental growth and phenotype data are necessary for true evaluation (Tramontano et al. 2018).

Fig. 5.2
figure 2

Time line of genome-scale metabolic models for yeasts. Information partially extracted from (Sanchez and Nielsen 2015; Lopes and Rocha 2017). The arrows start from the template models used in reconstruction

5.2 Yeast Genome-Scale Models

Several genome-scale metabolic models have been reconstructed for S. cerevisiae during the last 15 years. The first S. cerevisiae model was created in 2003 by Föster et al. 2003 and was named iFF708 after the main developers and the number of genes supporting the reactions in the model. Slightly different and variable numbers of genes were annotated to metabolic reactions in the three following S. cerevisiae models (iND750, iLL672, and iIN800) derived directly from iFF708. Creating the first consensus model for S. cerevisiae was a collaborative effort. It was built on the iLL672 and iMM904 models (derived from iND750 model) and published in 2008 (Herrgård et al. 2008). After several updates of, in particular, lipid metabolism and transport reactions, the consensus model version 7 was published in 2013 by Aung et al. (2018). Since then the consensus yeast model has gone through several smaller updates (https://github.com/SysBioChalmers/yeast-GEM). Heavner and Price (2015) compared the 12 (S. cerevisiae) metabolic models created from 2003 until 2015. Though the coverage (i.e., number of genes annotated) and predictive power (i.e., in terms of gene essentiality predictions) had increased over time, the coverage of the models does not always correlate with the predictive ability. Extensive models annotating higher number of genes do not necessarily have better essentiality prediction capabilities than simpler ones. Introducing additional minor activity encoding genes may decrease the predictive capacity if the encoded enzymes cannot alone sustain the corresponding reactions (Pereira et al. 2016). However, in addition to using the models for predictive simulations of genotype–phenotype translation, the genome-scale metabolic models can also be seen as knowledge bases containing all known biochemical conversion potential of the organism. Including the minor activity encoding genes and the corresponding reactions in a model are valuable for a knowledge base or a biochemical interaction network use. In conclusion, the several genome-scale metabolic models of S. cerevisiae have been developed and evolved independently for different purposes and none of them is generally the best.

Genome-scale metabolic models have been reconstructed, and manually curated, also for other yeasts than S. cerevisiae (Fig. 5.2). The models have commonly been reconstructed in a comparative manner using an S. cerevisiae model as a template. The reconstruction tool RAVEN especially supports the comparative reconstruction using an S. cerevisiae and CoReCo exploits species relatedness in scoring the reactions (Pitkänen et al. 2014). The models for industrially relevant species K. lactis, P. pastoris, S. stipitis, and Y. lipolytica, and for pathogenic C. glabrata have been derived using S. cerevisiae models as templates. For pathogenic C. tropicalis and for scientifically relevant S. pombe model reconstructions no S. cerevisiae framework has been reported. In addition, a large set of draft fungal models, including yeast models, reconstructed using CoReCo (Pitkänen et al. 2014; Castillo et al. 2016) are available in the BioModels database (Chelliah et al. 2015). In addition to the BioModels database and the developer’s specific sites, genome-scale metabolic models for various species can be downloaded from other public databases such as BiGG database (http://bigg.ucsd.edu/) (King et al. 2016).

Table 5.1 Development frameworks and higher level tools for genome-scale metabolic model manipulations and simulations

5.3 Methods for Metabolic Phenotype Simulations Derived from Flux Balance Analysis (FBA)

A myriad of methods for performing phenotype simulations using genome-scale metabolic models derived from Flux Balance Analysis (FBA) (Varma and Palsson 1994). FBA solves a linear programming problem of optimizing biologically relevant objective function (typically growth) under metabolic steady-state mass conservation, enzyme capacity, and thermodynamic constraints. Steady-state assumption implies that the intracellular metabolite concentrations are constant (i.e., their time derivatives are zero). Thus, the steady-state assumption renders the problem linear (Fig. 5.2) and eliminates the need to describe the reaction kinetics that are functions of reactant abundances often with several unknown parameters. The steady-state assumption linearizing the problem is well justified for many metabolic states. Particularly well the steady-state assumption holds when microbial cells divide unlimited by the external conditions or grow in continuous cultivations under constant conditions. Under these conditions, FBA-optimized growth yields have been found to closely match experimental observation in microbial species (Edwards et al. 2001). Yet, other optimality principles than growth such as maximization of energy generation in terms of ATP have been suggested and evaluated (Schuetz et al. 2007). Model simulations of optimizing defined objective functions take globally into account cellular energy and redox balancing requirements when fulfilling mass balance, enzyme capacity, and thermodynamic constraints in the whole metabolic network. Enzyme capacity and thermodynamic constraints are introduced into the FBA problem as flux upper and lower bounds. Commonly, the sign of flux value describes the net flux direction of the reaction but alternatively forward and backward reactions can be separately represented in the model. When thermodynamics do not allow for a particular reaction direction under cellular conditions (Flamholz et al. 2012), the flux bounds can be assigned accordingly for simulations.

Phenotype simulations with FBA and derived tools and genome-scale metabolic model manipulations are facilitated with frameworks supporting method development and/or tools with higher level interfaces for analysis (Table 5.1). While Python-based frameworks, relying on COBRApy (Ebrahim et al. 2013), are currently the primary choice of developers, there are R (R Development Core Team 2018) (Sybil (Gelius-Dietrich 2013)) and MATLAB (www.mathworks.com) (COBRA toolbox, (Schellenberger et al. 2011; Heirendt et al. 2017)) based frameworks available as well. The frameworks and tools commonly offer interfaces to external LP (and commonly also Mixed-Integer Linear Programming (MILP) and Quadratic Programming (QP)) solvers (e.g., glpk (www.gnu.org/software/glpk/), cplex (www.ibm.com/analytics/cplex-optimizer), gurobi (www.gurobi.com)) to be recruited for different applications. External libraries may also be engaged by the tools, in particular, for manipulating models in common Systems Biology Markup Language (SBML) format (Hucka et al. 2003) (SBML toolbox (Keating et al. 2006), libSBML (Bornstein et al. 2008)). Tools with higher level interfaces allow also experimental scientists analyzing metabolism with genome-scale models and designing genotype manipulations, as will be reviewed below.

Genome-scale metabolic model simulations using FBA with alternative, other than biological design principle mimicking, objectives can be used to explore an organism’s metabolic potential, possible metabolic states it may have. For instance, under the given mass balance, enzyme capacity, and thermodynamic constraints, the optimal theoretical yields of biotechnologically relevant molecules can be solved with simulations. The simulations can be done by assigning alternative nutritional conditions mimicking different growth media or bioconversion substrates. In case substrate utilization rates are available, they can be introduced to the models as exchange fluxes between cells and the environment, and FBA can be used to predict optimal steady-state growth (1/h) and specific production rates (mmol/(g cell dry weight * h)) instead of yields. While the optimal value solved for the chosen objective by FBA (i.e., yield or rate) is global and unique, the other fluxes (i.e., variables of the optimization problem) may adopt different values under optimality. Thus, there may be several, alternative, yet equally optimal metabolic phenotypes in terms of the defined objective function.

5.3.1 Parsimonious Flux Balance Analysis (pFBA)

Parsimonious Flux Balance Analysis (pFBA) aims at reducing the set of alternative equally optimal flux states in a biologically relevant way (Lewis et al. 2010). pFBA derives from FBA and includes a bi-level optimization where first the biological design objective (e.g., growth) is optimized after which, under the optimality condition, another linear programming problem is solved to minimize the sum of the fluxes. The flux-sum minimization in pFBA can be seen biologically relevant in optimizing the enzyme usage, and thereby the cellular resource utilization. Flux-sum minimization efficiently omits futile flux cycle artifacts from the returned flux vector. Yet, fluxes may adopt alternative values also under pFBA optimality.

5.3.2 Flux Variability Analysis (FVA)

The ranges of possible values fluxes may adopt under particular optimality can be assessed with Flux Variability Analysis (FVA) (Burgard and Maranas 2001; Mahadevan and Schilling 2003). FVA can be performed under the optimality of the assigned objective (i.e., commonly growth) or different levels of it. The computation involves solving two subsequent linear programming problems, minimization and maximization, for each of the fluxes. The fluxes whose ranges do not pass zero are coupled to the objective and can thus be considered essential for the particular objective. General analysis of flux coupling in a metabolic network is derived from FVA (Burgard et al. 2004).

5.3.3 Simulating Mutant Cell Phenotypes

The above FBA-derived simulation approaches assume optimal distribution of flux in the metabolic network. In case of FBA simulation with an objective function mimicking biological optimality principle, the premise is justified by evolutionary optimization of organism’s metabolism (Ibarra et al. 2002). However, mutant strains engineered in laboratory cannot be assumed to function optimally. Minimization of Metabolic Adjustment (MoMA) approach was developed to simulate the metabolic state of such engineered mutant strains (Segrè et al. 2002). MoMA solves a quadratic optimization problem of minimizing the flux differences to a reference flux state (i.e., wild-type flux state) given the constraints arising from the engineered modifications to the strain (e.g., gene deletions). There is also a linearized version, linear Minimization of Metabolic Adjustment (lMoMA) of the algorithm (Burgard et al. 2003; Becker et al. 2007). In biological sense MoMA and lMoMA assume that the wild-type regulation is still driving the distribution of metabolic fluxes in engineered but not evolutionarily streamlined strains. Wild-type regulation-driven flux distribution in engineered cells is also simulated with Minimization of Metabolites Balance (MiMBl) algorithm (Brochado et al. 2012). In contrast to MoMA and lMoMA, MiMBl is independent of the stoichiometric representation of the reactions. While multiplicating the stoichiometric coefficients of particular reaction(s) (which does not affect the reaction stoichiometry or elemental balance) would alter the output of MoMA computation, MiMBl solution would be unaffected. MiMBl computation minimizes the flux distribution difference to the wild-type state in terms of metabolite turnovers instead of fluxes. Yet another approach for simulating the metabolic state of engineered, but not evolved organisms is Regulatory On/Off Minimization (ROOM) algorithm (Shlomi et al. 2005). ROOM minimizes the number of fluxes that are changed in mutant cells compared to wild-type cells. The underlying premise in ROOM is the same as in MoMA, lMoMA, and MiMBl in assuming that the wild-type regulation drives the distribution of fluxes in a non-evolved mutant strain. In ROOM simulations, it is further assumed that the mutant metabolic state is reached through only the necessary transient metabolic changes mediated by the regulatory network. The necessary changes are simulated with ROOM by solving a Mixed-Integer Linear Programming (MILP) problem.

5.4 Examples of Genotype–Phenotype Simulations: Single and Double Gene KOs

The above-introduced simulation tools using genome-scale metabolic models allow predicting phenotype effects following from gene deletions (Förster et al. 2003). In silico metabolic gene deletions are propagated through the Boolean gene-reaction rules into reaction activities. If a regulatory model is integrated as in rFBA approach (Covert et al. 2001; Herrgård et al. 2006), the regulatory gene deletions can be first propagated to the status of metabolic genes through the regulatory Boolean rules, and then through the metabolic model’s gene-reaction rules into reaction activity states. The phenotype simulation is then performed with updated reaction activity states. FBA or another simulation algorithm, not assuming the metabolism in mutant could necessarily become optimized, can be used. In case the simulated growth is negligible, the deleted gene is predicted essential. Double gene deletion simulations predict in silico synthetic lethal gene pairs (Suthers et al. 2009). Since experimental screens of gene deletion mutants in model organisms are available in genome-scale, comparison to in silico model predicted essentialities and synthetic lethalities can be used for validating metabolic model reconstruction algorithms.

5.5 In Silico Metabolic Engineering—Strain Design

Since the genome-scale metabolic models allow predicting translation of genotype to phenotype, they can be used to design genotype manipulations leading to desired phenotypes. Overproducer phenotypes are especially sought for industrial biotechnology applications. While native strains are evolved to distribute the available resources for growth and survival, feasible industrial production using a microbial fermentation process requires cells to divert substantial resources to product synthesis. Diverting cellular resources toward production is the aim of metabolic engineering of the industrial biotechnology host organisms, like yeasts, in addition to introducing the production pathways in case of heterologous products. Strategies to achieve the desired metabolic flux re-regulation diverting resources efficiently to the production pathway can be computationally designed using genome-scale metabolic models. An elegant solution for the inherent competition of growth and product synthesis for resources is to align those objectives through metabolic network modifications. Aligning the growth and production objectives in cells can be achieved with specific metabolic gene deletions resulting in growth-coupled production. The specific metabolic gene deletions reduce the metabolic network in such a way that the cells cannot grow (optimally or at all) unless they simultaneously synthesize the product. In other words, some growth essential pathway produces the desired product as an unavoidable side stream. OptKnock was the pioneering method for finding growth–product coupling creating deletion targets using metabolic models (Burgard et al. 2003). It was implemented as a bi-level MILP. An alternative implementation of in silico growth–product coupling design is OptGene in which the phenotype simulation is embedded in a genetic algorithm allowing for nonlinear design objectives and searching larger target gene sets (Patil et al. 2005; Asadollahi et al. 2009). OptGene has been used successfully to design, for example, succinate and terpenoid overproducing S. cerevisiae strains (Otero et al. 2013; Asadollahi et al. 2009). For vanillin production in S. cerevisiae (in form of vanillin glycoside to reduce toxicity), OptGene was used to identify deletion targets out of which GDH1 (glutamate dehydrogenase encoding) and PDC1 (pyruvate decarboxylase encoding) deletions were experimentally implemented and evaluated (Brochado et al. 2010). Single deletion mutants, a double deletion mutant, and a double deletion mutant with GDH2 overexpression to improve nitrogen assimilation defect in gdh1\(\Delta \) were constructed. The mutant strains except single gdh1\(\Delta \) mutant showed 1.5 fold increase in vanillin glucoside yield in batch cultures compared to the non-host metabolism optimized strain. Furthermore, optimizing the synthetic, four-step, production pathway of vanillin glucoside in S. cerevisiae did not improve the production, before the OptGene identified targets to optimize the host metabolism were implemented (Brochado et al. 2010; Brochado and Patil 2013). Later, Tepper and Shlomi (2010) released their RobustKnock version for extracting such growth–product coupling creating deletions that force product synthesis with an additional optimization step (Tepper and Shlomi 2010). Growth–product coupling creating manipulations to genome fix the relative yields of biomass and target product. However, the rates are amenable for improvement through Adaptive Laboratory Evolution (ALE) of the mutant strains. While faster growing cells are selected for, the coupled production rate is improved on the side (Otero et al. 2013). If the growth–product coupling relies on a carbon–carbon bond cleaving reaction splitting a precursor for growth and production, the coupling is likely to be very robust in ALE. An Anchor reaction producing an essential precursor for growth and another product convertible to the target product is biochemically essential for a growth–product coupled reduced metabolic network (Jouhten et al. 2017). Carbon–carbon bond cleaving Anchor reactions are a subset of all possible Anchors. Growth-coupled succinate production in S. cerevisiae relies on carbon–carbon bond cleaving isocitrate lyase as an Anchor reaction (Otero et al. 2013). The initial production rate after the metabolic network reduction for growth–product coupling was substantially improved with ALE along with relieving glycine auxotrophy (Table 5.2).

Table 5.2 Examples of reported overproducer yeast strains whose development has been involved using genome-scale metabolic model simulation tools

Metabolic network manipulations for achieving growth–product coupling are identifiable also with elementary-mode analysis methods (Schuster and Hilgetag 1994; Schuster et al. 2000; Trinh and Srienc 2009; Unrean et al. 2010; Hädicke and Klamt 2011). Elementary modes are minimal sets of reactions allowing a steady-state operation (Heinrich and Schuster 1998). Engineering strategies are designed for disabling undesired elementary modes while retaining the desired ones (Hädicke and Klamt 2011). Introducing flux capacity constraints to the elementary-mode framework, as in FBA-derived methods, is enabled using Elementary Flux Vectors (EFVs) allowing also designing growth–product coupling strategies (Urbanczik 2007; Klamt and Mahadevan 2015). The scalability of searching metabolic engineering strategies in silico using elementary-modes-based approaches has been limited but is improving through algorithmic developments (von Kamp and Klamt 2014). Currently, minimum sets of genetic engineering targets can be exhaustively identified enabling evaluations also in yeast hosts. Beyond identifying growth–product coupling strategies, genome-scale metabolic models allow designing also other kinds of engineering strategies for improving production. While the methods for designing strategies to optimize the cellular fluxes for production are broadly reviewed elsewhere (e.g., Maia et al. (2016)) many of them are yet to be evaluated for yeasts. Among the variety of approaches, there are methods for identifying not only knockouts but also up- and downregulation targets for improving production. OptReg identifies combined strategies of deletions, overexpressions, and downregulations for host optimization as bi-level MILP solutions (Pharkya and Maranas 2006). Similarly, OptForce identifies combined strategies in a comparative manner against the wild-type flux status by classifying reactions based on the type of manipulation they require for optimizing production (Ranganathan et al. 2010). Flux Scanning based on Enforced Objective Flux (FSEOF) considers the wild-type flux status by identifying upregulation engineering targets as genes annotated to reactions whose flux is increased in silico when the production objective is enforced while biological objective (i.e., growth) prevails (Choi et al. 2010). FSEOF-identified targets have successfully been implemented in P. pastoris yeast for improving protein production (Nocon et al. 2014). The strain improvement strategies may also benefit from augmenting metabolic models with additional information on metabolic enzymes or wild-type phenotype. For instance, k-OptForce integrates available enzyme kinetic information to improve predictions by considering metabolite concentration effects on the distribution of fluxes (Chowdhury et al. 2014). OptFlux allows using gene expression data for using a comparative approach against the wild type for identifying overexpression and downregulation targets in a metaheuristic optimization framework (Gonçalves et al. 2012). Importantly, considering the wild-type gene expression data allows relieving the optimality assumption from the native operation of cells allowing a comparative strain design also in secondary metabolic pathways (Kim et al. 2016). Accordingly, transcriptomics-based Strain Optimization Tool (tSOT) identifies the metabolic engineering targets by considering the wild-type flux regulatory status inferred from gene expression data (Kim et al. 2016). However, a word of caution though, the gene expression status of central metabolic enzymes may not very well reflect the actual flux status in yeast cells as (Machado and Herrgård 2014) observed when integrating gene expression data to genome-scale metabolic models.

5.6 Integrating Omics Data into Models

Genome-scale metabolic models offer frameworks for integrating omics data since they connect metabolic genes/proteins to reaction fluxes through which biochemical conversion of metabolites occurs. Fluxes together with metabolite abundances are the metabolic phenotype determined by and reciprocally regulating the underlying transcriptional and translational states in a cell. Evolutionarily shaped cellular regulation can vary the metabolic phenotypes within the ultimate limits of the laws of mass conservation and chemical thermodynamics. Therefore, transcriptomics, proteomics, or metabolomics data have been integrated to the models for shrinking the space of feasible metabolic states to improve flux estimation outcomes. Indeed, flux predictions would often benefit from specific constraints representing the regulation of the metabolic network utilization under particular conditions (e.g., repression of respiration in S. cerevisiae on high glucose). Several methods have been developed for inferring the flux states from gene expression data, the most abundantly available omics data type. iMAT (Shlomi et al. 2008), GiMME (Becker and Palsson 2008), GIM3E (Schmidt et al. 2013), RELATCH (Kim and Reed 2012), and INIT (Agren et al. 2012) methods derive expected or allowable flux states from the gene expression data. However, flux estimation could also be misled by gene expression data (Machado and Herrgård 2014) as post-transcriptional regulation of metabolic phenotypes is prevalent. Consequently, additional constraints derived from proteomics data integrated with enzyme-specific turnover numbers (kcat) (Sanchez et al. 2017; Vazquez and Oltvai 2016) have allowed reproducing, using model simulations, metabolic phenotypes (e.g., overflow metabolism) that are not well captured with plain FBA or apparent in gene expression data. Further, time derivatives of extracellular metabolites in a cell culture (i.e., rates of consumption and production) can readily be integrated into the models as bounds on exchange fluxes between cells and environment, allowing simulations of consistent intracellular flux states (Mo et al. 2009). However, while the exchange flux, gene expression, and proteomics data derived constraints can directly be assigned to the fluxes in models, integration of intracellular metabolite abundance data to steady-state simulations is less straightforward. Metabolite concentrations can be used to refine reaction thermodynamics for resolving feasible reaction directions (Henry et al. 2007; Kümmel et al. 2006). Further, constraints for flux changes have been derived from relative metabolomics data through the connectivity of metabolites with several reactions in the metabolic network (Sajitz-Hermstein et al. 2016). Vice versa, metabolite concentration changes can be predicted using gene expression data and the network neighborhood (Zelezniak et al. 2014). When the metabolite concentration change prediction from gene expression data and network connectivity fails, the particular metabolite is likely to be connected to a post-transcriptionally regulated enzyme (Zelezniak et al. 2014). Likely post-transcriptionally regulated enzymes can similarly be identified in disagreements of gene expression data and flux estimates (Shlomi et al. 2008). Thus, omics data integration with model simulations allows also uncovering how the cells have achieved the observed metabolic phenotypes. Recently, (Strucko et al. 2018) uncovered in molecular detail how S. cerevisiae achieved an efficiently glycerol-utilizing phenotype through Adaptive Laboratory Evolution (ALE). Classical genetic crossing, genome-scale metabolic model simulations, whole genome sequencing, and omics analyses revealed involvement of all levels of cellular regulation, in a pathway-dependent manner, in achieving the glycerol utilization trait. The ALE for glycerol utilization was performed for a laboratory strain of S. cerevisiae, commonly lacking the ability to grow on glycerol in absence of amino acid supplementation. Interestingly, some wild S. cerevisiae strains can grow on glycerol as the sole carbon source, and the metabolic network structure of S. cerevisiae does not object the conversion of glycerol to biomass even without amino acids being provided. By gradually decreasing the amino acid supplementation, evolved lineages growing on glycerol as the sole carbon source were obtained (Strucko et al. 2018). Whole genome sequencing of evolved lineages revealed mutations that arose during the ALE. Few metabolic genes and genes involving osmoregulation controlling glycerol accumulation in cells had been repeatedly hit by mutations. A lineage not having loss-of-function mutations in osmoregulation involved genes was characterized in controlled bioreactors and analyzed on different omics levels (i.e., RNA sequencing, proteomics, and metabolomics). Further, genome-scale metabolic model simulations were run for identifying the necessary but minimum re-regulation of wild-type metabolic fluxes for achieving an optimally glycerol-utilizing phenotype. The identified necessary flux changes were overlaid with the mutated genes and the omics data on the metabolic network. The model simulations had revealed a necessary downregulation of TCA cycle activity while maintaining respiratory function for glycerol utilization which was in perfect concordance with the otherwise obscure KGD1 (encoding alpha-ketoglutarate dehydrogenase in the TCA cycle) loss-of-function mutation gained repeatedly in ALE. Further, the model simulations predicted also an activation of GABA shunt bypass of the TCA cycle for optimizing glycerol utilization. Indeed, reactant ratios from metabolomics data were in agreement with the GABA shunt activation. In addition, gene/protein expression changes were in agreement with the model simulated prediction of decreased TCA cycle flux. In conclusion, the flux change predictions with model simulations effectively reconciliated the separate observations in omics data and the genes repeatedly mutated in ALE.

5.7 Regulation of Yeast Metabolism: Key Nodes and Their Impact on Flux Distribution—Future Directions of Reincorporating These into Models

While metabolic models have greatly improved our ability to systematically map genotype–phenotype relations, they have also brought forward key gaps in the understanding of the complex interactions between different metabolic pathways and between metabolic and regulatory processes. This becomes evident when considering the dramatically reduced performance of genome-scale metabolic models from well predicting the essentiality of single genes to the low accuracy in predicting genetic interactions (Brochado et al. 2012). A major limitation of the models, especially when tackling higher order complex interactions, is the large degrees of freedom, i.e., multiple ways that the resource (carbon and other elemental) fluxes can be distributed in the cell. Without considering additional constraints imposed by protein abundance and activity status (e.g., phosphorylation), metabolite concentrations, and allosteric regulations, the models will not be able to narrow down the predictions on the actual routes operating in cells. Different approaches have been proposed toward constraining the solution space of metabolic models for improving the accuracy of predictions in a biologically sound manner. These include knowledge-based heuristics imposing constraints on flux distribution at key branch points (Pereira et al. 2016), constraining the fraction of protein resources allocated to metabolic processes (Sanchez et al. 2017), imposing a constraint on maximum Gibbs energy dissipation from cells (Niebel et al. 2019), and large-scale kinetic models that include metabolite concentrations and enzyme kinetic parameters (Chakrabarti et al. 2013; Stanford et al. 2013; Smallbone et al. 2010). The last mentioned would be an ideal approach encompassing various complexities in their mechanistic detail. Yet, the lack of reliable in vivo data on enzyme kinetics, metabolite concentrations, and enzyme/metabolite distributions within a cell limit the use of kinetic modeling to well-studied conditions and relatively small perturbations. Further, introducing a constraint on Gibbs energy dissipation to the metabolic models is computationally demanding as it results into nonlinear and non-convex model. Thus, the first two approaches are likely to be the most fruitful in the near future. Indeed, the distribution of major metabolic fluxes in yeast cells are tied to the redox and energy cofactor balance, which, in turn, are closely coupled with the flux distribution in pentose phosphate pathway and pyruvate nodes. The former largely determines the NADPH production and the latter affects NADH and ATP turnover. Indeed, a recent study (Yu et al. 2018) elegantly demonstrates this by replacing ethanol production by fatty acid production. Given that ethanol accumulation is a hallmark of yeast metabolism, this is a remarkable feat and yet can be understood in terms of redox balance rewiring. Along similar lines, an approach considering protein allocation constraint has suggested that lower protein requirement of ATP generation through fermentation is the trade-off factor underlying the switch from respirative to fermentative metabolism at higher glucose utilization rates in yeast (Nilsson and Nielsen 2016). The ongoing efforts in expanding the models to incorporate transcriptional and translational processes (Yang et al. 2018) are likely to complement the abovementioned approaches in expanding the scope of metabolic models as well as in improving their accuracy which is capturing complex metabolic traits.