Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Tuberculosis (TB) is a disease that plagued ancient Egyptians and still remains a major threat to human health thousands of years later. The control of tuberculosis has been significantly hindered by the limited resources available for both the ­prevention and treatment of tuberculosis. A truly effective vaccine is lacking as the 90-year-old Mycobacterium bovis bacillus Calmette–Guerin live attenuated vaccine is not universally protective and does not produce immunity against re-infection or reactivation. Lengthy (6–9 months) and complex (three or more different drugs) treatment is required using currently available anti-TB drugs. The economic and logistic burden of administering these drug regimens in industrially undeveloped countries where TB is most prevalent is enormous and combined with poor patient compliance are important factors in the emergence of drug-resistant TB isolates that are causing ongoing epidemics. These factors underscore the urgent need for the development of novel and effective therapeutics and vaccines and new approaches will be required to achieve these goals.

Mycobacterium tuberculosis is an unusual bacterial pathogen, which has the remarkable ability to cause both acute life-threatening disease and also clinically latent infections which can persist for the lifetime of the human host. Unlike many pathogens M. tuberculosis does not rely on the production of specific toxins to cause disease but rather the secret of this bacterium’s great success seems to be the ability to adapt and survive within the changing and adverse environment provided by the human host during the course of an infection. It is becoming apparent that key to this adaptation is the metabolic reprogramming of M. tuberculosis during both the acute and chronic phase of TB disease and therefore a more complete understanding of mycobacterial metabolism remains a major goal of TB research.

Whilst recent increases in research funding have progressed our understanding of the basic biology of M. tuberculosis this has not yet impacted on the global TB trends which remain at staggering levels. A possible reason why it has been difficult to translate basic research into effective strategies for combating tuberculosis is that TB research has until recently focused on studying individual parameters in isolation which can consequently result in an overestimation of the importance of these factors. This effect may be particularly profound for a persistent pathogen such as M. tuberculosis that lacks classical virulence factors. The systems biology framework, which ­investigates the dynamic interactions of many components, provides an alternative and complementary strategy to the more traditional reductionist approaches to TB research. This methodology has started to be applied to the metabolism of M. tuberculosis on a genome scale and promises to drive biological discovery in the TB research field by providing scaffolding for the interpretation of “omic” scale datasets, directing hypothesis driven discovery and also assisting in the identification of novel drug targets.

2 Central Metabolism of M. tuberculosis

Application of metabolic modelling approaches to M. tuberculosis is aided by the fact that metabolism is a reasonably well-studied system even in mycobacteria. Moreover, metabolism has been shown to be involved in the virulence of M. ­tuberculosis, playing a key role in the development and maintenance of both acute and persistent TB infections [17]. It is perhaps not surprising therefore that several modelling efforts in tuberculosis have focused on metabolism.

Much of what is known about metabolism in M. tuberculosis has been gleaned from conventional biochemical and molecular studies over many decades. The pathogen appears typical of bacteria of the Actinomycetales order, with a ­predominantly aerobic metabolism that is able to catabolise a wide range of ­substrates to generate biomass and energy. The genome encodes all the enzymes of the Embden–Meyerhof–Parnas pathway (EMP) and pentose phosphate pathway (PPP) and has a complete, or nearly complete tricarboxylic acid (TCA) cycle (see below). The pathogen also encodes a functional glyoxylate shunt as well as several enzymes connecting the TCA cycle and glycolysis that may be used for either ­anaplerosis or gluconeogenesis.

There are, however, several features of central metabolism in M. tuberculosis that appear to be unusual. Although the link between glycolysis and the TCA cycle is complete in M. tuberculosis, the closely related pathogen M. bovis lacks a functional pyruvate kinase and is therefore unable to deliver sugars from glycolysis to the TCA cycle. It is thus unable to utilise carbohydrates as the sole carbon source [8]. This function is therefore unnecessary in vivo, as this pathogen causes very similar ­disease in humans to M. tuberculosis. The role of isocitrate lyase has been intensively ­studied since the demonstration that both of the isocitrate lyase genes encoded by this ­pathogen, icl1 and icl2 (although some strains only have icl1) play an essential role in virulence [1, 2]. This finding has been generally interpreted to be due to this enzyme’s role in the glyoxylate shunt and a metabolic shift in the principal carbon source from carbohydrates to fat in the host. However, the role of the isocitrate lyases maybe more complex than just fat catabolism, as these enzymes also function as methyl citrate lyases in the methyl citrate cycle [9], which is used to catabolise propionate, derived from the oxidation of odd-numbered chain and branched chain fatty acids. ICL has also been shown to be essential for intracellular ATP level reduction in a nutrient starvation model of persistence [10] and the glyoxylate shunt has been shown to operate concurrently with an oxidative TCA cycle which is ­completed by an anaerobic α-ketoglutarate ferredoxin oxidoreductase [11]. More recently, we have demonstrated an essential role for ICL during slow growth rate on glycerol, a substrate that would be expected to be catabolised via glycolysis and the TCA cycle [12, 13].

It was reported that the TCA cycle was atypical in M. tuberculosis as the pathogen was proposed to lack α-ketoglutarate dehydrogenase (KDH) activity and thereby the standard connection between α-ketoglutarate and succinate via succinyl CoA [14]. These findings prompted the proposal that M. tuberculosis operates an alternative route (the SSA shunt) between α-ketogluterate and succinate via the enzyme α-ketoglutarate decarboxylase (KGD, putatively encoded by Rv1248c, to produce succinic semialdehyde which could be converted to succinate by succinic ­semialdehyde dehydrogenase (SSADH encoded by gabD1/D2) [15]. It was also pointed out [15] that M. tuberculosis has all the enzymes required for a GABA shunt capable of converted α-ketoglutarate to succinic semialdehyde (and then on to ­succinate) via glutamate and 4-aminobutyrate (GABA). However, neither of these SSA-based shunts accounts for the synthesis of succinyl CoA, which is an essential precursor of both heme and branched fatty acids. Recently, the enzyme encoded by Rv1248c was shown to be a carboligase with 2-hydroxy-3-oxoadipate synthase (HOA synthase) activity capable of condensing α-ketogluterate with glyoxylate to yield 2-hydroxy-3-oxoadipate (HOA) which decomposes to 5-hydroxylevulinate (HLA) [16], undermining evidence for a SSA shunt in M. tuberculosis. However, the enzyme does appear to form SSA in the absence of glyoxylate [16, 17] so it may be the SSA shunt functions when levels of glyoxylate in the cell are low. Indeed recent work demonstrated that Rv1248c appears to be multifunctional enzyme with classical succinyl-transferring KDH activity, but also KDG and carboligase activity [17]. Additionally, an alternative route to succinyl CoA from α-ketoglutarate has also recently been shown to be active involving a CoA-dependent ferredoxin ­oxidoreductase (KOR), which operates preferentially under anaerobic conditions [11]. Recent evidence has also emerged that, under anaerobic conditions, M. ­tuberculosis operates a reverse TCA cycle involving the reduction of fumarate to succinate (which is then secreted) by fumarate reductase, possibly as a means of generating redox potential and maintaining the membrane potential in the absence of oxygen [18]. It therefore seems that M. tuberculosis encodes a number of ­alternative pathways that could operate around the TCA cycle, although the significance of most of them in vivo remains to be determined. Figure 4.1 illustrates the central metabolic pathways of M. tuberculosis, as understood in 2012.

Fig. 4.1
figure 00041

Central metabolism in M. tuberculosis. The standard TCA cycle is shown in blue with the variant (SSA) pathway in yellow and GABA pathway in green. Anaplerotic/gluconeogenic reactions are shown in purple with the glyoxylate shunt in red. Only enzymes mentioned in the text are indicated, including pyruvate kinase (PK), pyruvate phosphate dikinase (PPDK), KOR (α-ketoglutarate ferredoxin oxidoreductase), KGD (α-ketoglutarate decarboxylase), GabD1/D2 (succinic semialdehyde dehydrogenase), GDH (glutamate dehydrogenase), GAD (glutamate decarboxylase), ICL (isocitrate lyase) and MEZ (malic enzyme (malate dehydogenase, decarboxylating)), PEPCK (phosphoenolpyruvate carboxykinase), PK (pyruvate kinase) and PYC (pyruvate carboxylase)

3 Experimental Systems for Systems Biology

Systems biology is an iterative procedure of experimental data acquisition, model building, hypothesis generation and experimental verification. One of the constraints upon this approach surrounds the experimental basis of this work. Models should be developed and validated with accurate and reproducible data. Moreover, the ­mathematical underpinning of many modelling approaches such as 13C-MFA have an absolute requirement for the cultivation of the organism under steady-state ­conditions where metabolite concentrations are maintained at constant levels. This makes it very difficult to apply these approaches directly to pathogens such as M. tuberculosis growing in vivo, as such steady states are not attainable in mammalian cells. However, a standard approach in systems biology is to initially study systems in highly ­controlled experimental environments that allow models to be parameterised before their subsequent application in real life situations. One of the pioneers of systems biology, Hiroaki Kitano [19], uses an example from racing car design to illustrate this approach. Cars are initially designed using a computer and then tested in a wind _tunnel before being deployed in the actual race. By controlling airflow, wind tunnels transform a highly dynamic unsolvable system into steady state that is amenable to mathematical modelling. Kitano argues that systems biologist’s similarly needs ­biological “wind tunnels” to develop their models. We here describe the application of one of the few biological wind tunnels: the chemostat.

Traditional batch cultivation remains the standard for most microbiological investigations. Typically, the microbe is inoculated into a stirred vessel filled with rich media. The organism will grow at close to maximal rate (logarithmic phase) until either nutrients (including oxygen) become limiting or inhibitory products accumulate to levels that retard growth (stationary phase). Although convenient and suitable for many microbiological, genetic and functional genomic applications, this culture method is unsuitable for most systems biology applications, because (a) it is dynamic with cells adapting to a constantly changing environment; (b) it is not usually possible to monitor rates of substrate utilisation or product accumulation; (c) the culture ­system is usually uncontrolled and thereby subject to wide fluctuations in parameters such as pH or oxygen concentration; (d) several microenvironments exist in most batch culture vessels that allow microbial growth in different physiological states so that the average value of measured parameters may not represent the actual value of those parameters in any single cell (the mode and the mean are very different so no cell exists with the actual parameter values obtained from measurement). This latter consideration makes ­modelling of batch culture systems extremely problematic.

The need for maximal control of the experimental aspects of systems biology together with attainment of steady-state conditions has led to resurgence in the use of continuous culture systems such as chemostats [20]. During continuous culture in a chemostat, microbes are grown at a rate set by the experimenter and other ­environmental parameters such as pH, oxygen levels are also precisely controlled. Culture medium is pumped at a constant rate into the vessel whilst the volume of the culture is kept constant by an overflow system. The flow rate (f) of the media is set by the experimenter to give a desired dilution rate (D). The dilution rate is the ­number of culture volumes passing through the chemostat per unit of time and equals the flow rate divided by the culture volume (V). The chemostat controls growth rate (μ) by limiting the availability of a growth substrate. The medium contains a fixed concentration of the limiting substrate, all the other nutrients being present in ­essentially excess amounts. By adjusting the feed rate the growth rate can be adjusted to 1–90% of the maximum growth rate for the organism. When a dilution rate is set the cells will initially grow as in a batch culture at the maximum specific growth rate (μ max) until a substrate in the medium becomes limited. Eventually the cells adjust to the rate of nutrient supply so that the specific growth rate equals the dilution rate, i.e. D  =  μ. This balanced growth is known as steady state and may be maintained indefinitely. During steady state the physiology of the cells remains constant, ­cellular processes being controlled by the concentration of the limiting substrate.

The chemostat therefore enables growth in a tightly regulated steady-state ­environment and thereby eliminates the inherent variability and dynamics of ­constantly changing batch cultures. The chemostat system is thereby analogous to the aerodynamic wind tunnel. It effectively freezes the dynamics of microbial growth to attain a steady-state system that is amenable to constraint-based ­modelling approaches, such as flux balance analysis (FBA), which critically depend on the assumption that concentrations of internal metabolites are held constant during the experiment. Data from chemostat cultivations is therefore more precise, ­reproducible and statistically significant than those obtained from batch cultivations [2123]. Moreover, because the cultures are relatively homogenous, the mean value of ­measured parameters in samples removed from the chemostat is likely to ­correspond to the mode value of those parameters in the bulk population; allowing application of these values for model parameterisation.

The slow growth rate of pathogenic mycobacteria, problems associated with clumping of bacilli and safety considerations have all provided obstacles for researchers attempting continuous cultures of M. tuberculosis. James and colleagues [24] were the first to successfully cultivate M. tuberculosis in a chemostat at a growth rate of 24 h in a complex nutrient-rich medium and have used their system successfully to investigate the responses of M. tuberculosis to oxygen [25] and iron limitation [26] and also mutation rates at different pH [27]. These studies ­demonstrated that the chemostat provides a reliable and reproducible environment for culturing mycobacteria and is therefore a very useful tool for “omic” scale ­analysis such as DNA microarrays. It has been demonstrated that gene expression data from organisms, including M. tuberculosis, grown in the chemostat is significantly more reproducible than batch culture DNA-array data [23, 25, 28].

Our group has developed a system for growing mycobacteria in a carbon limited chemically defined minimal medium which can be used as a reproducible platform for systems biology studies [12, 2931]. Initial studies using this experimental system to grow M. bovis BCG (a non-pathogenic surrogate for M. tuberculosis) provided vital information on the biomass composition of the tubercle bacillus [29]. Studies prior to this are limited and were performed in poorly defined batch cultivations. For ­genome-scale metabolic models, the equations defining the biomass synthesis are very important and can impact on the predictive accuracy of the model. For two ­different growth rates (D  =  0.03; t d  =  23.1 or D  =  0.01; t d  =  69.3), the elemental and macromolecular composition of the biomass was measured and shown to change as a function of the growth rate. This study demonstrated that more than half of the dry mass of the mycobacterial cell was comprised of carbohydrates and lipids with only a quarter of the dry weight consisting of protein and RNA, but that these proportions change, depending on the growth rate. This data allowed a stoichiometric ­composition model for M. bovis BCG to be reconstructed, which is an important first step in the development of a metabolic network [29].

4 Metabolic Model Building

The ultimate goal of system biology approaches to studying TB is to construct a complete model of infection incorporating both the pathogen and host, but this is currently infeasible as the information about the different components to be included in the model is lacking. Studies with other organisms have demonstrated that metabolism is, by far, the best understood cellular network and is thereby an excellent starting point for a systems-based approach [3234].

However, metabolism is complex. Even the simplest organisms synthesise many hundreds of metabolites connected by a similar number of enzyme-catalysed ­reactions. Each reaction is described by a set of kinetic parameters (e.g. K m, V max) which, in combination with substrate/product concentrations, determine its rate. Although K m values are constants (for a particular substrate/product combination) and may be determined experimentally, intracellular concentrations of substrate, products and enzyme (influencing V max) vary over wide ranges and are not easily measured. Even a single enzyme reaction is therefore a highly dynamic system; and systems of just a few reactions steps are usually mathematically described by a set of ordinary differential equation with a large number of parameters and variables whose values are extremely challenging to measure experimentally. Kinetic models have therefore only been applied to the dynamics of small well-defined systems, such as glycolysis in Escherichia coli [35] that are very far from being genome scale.

However, it is relatively straightforward to generate a metabolic network that describes the biochemical reactions that an organism is predicted to be capable of performing, in terms of stoichiometric formulas (see Chap. 1). It is therefore ­possible to build a model consisting of all the stoichiometric reactions predicted by the ­annotation and link these pathways and networks connected by flux values between each reaction. These models can be interrogated with tools such as FBA and metabolic flux analysis (MFA) to gain insight into the underlying structure of the network, ­identify essential genes and pathways and simulate experiments. However, because metabolic networks contain multiple branch points and parallel pathways there is not a unique solution but a vast space of possible solutions (the system is underdetermined). It is therefore necessary to apply constraint-based approaches, which reduce the solution space and thereby predict metabolic capabilities or internal fluxes [3641]. FBA uses the procedure of optimisation to reduce the solution space (Chap. 1) ­optimising some parameter, which might be biomass production rate (and thereby growth rate), ATP synthesis, substrate consumption, product production or any other parameter of the model. Clearly there is a strong assumption in FBA that the cell applies a similar optimisation strategy and thereby grows at its optimal growth rate, ATP production rate or rate of other optimised parameter. If that assumption is ­correct then FBA will find the correct solution—the one that the cell finds—and the FBA solution will correspond to the biological reality. It is of course an open question how often and in what circumstances microbial cells such as M. tuberculosis do actually optimise simplistic parameters such as growth rate, particularly during in vivo growth. MFA applies an alternative approach to reducing the solution space: applying additional measurements as constraints [42, 43]. These may be ­measurements of intracellular metabolites, enzyme activities or indeed any ­additional measurement constraints, but the most powerful method currently ­available is 13C-MFA, which derives solutions for the intracellular fluxes from the distribution of 13C from a substrate into central metabolites and the amino acid ­products derived from central metabolites.

There are of course limitations to these approaches such as the requirement for steady or quasi-steady state conditions. Also, since no consideration is made of either transcriptional, translational, metabolic regulation or enzyme kinetics the predictive capabilities of constraint-based models are limited to situations when these factors are not significantly influencing reaction rates [34]. Nevertheless, these approaches have been successfully applied to predict the metabolic capabilities of many different cellular systems [4449]. The application of both of FBA and MFA to M. tuberculosis will be discussed below.

5 Metabolic Models of M. tuberculosis

The first M. tuberculosis constraint-based model was constructed by Raman et al. and consisted of all the reactions in mycolic acid synthesis [50]. This sub-model of metabolism was composed of 219 reactions that involved 197 metabolites, ­catalysed by 28 enzymes. FBA was used to simulate mycolic acid metabolism and to identify potential drug targets in these pathways. The study illustrates the importance of optimisation in FBA. As already discussed, FBA reduces the solution space by ­optimising a parameter, usually known as the objective function, so choice of the choice of the parameter to be used as the objective function has considerable influence on the solutions obtained. Popular objective functions include ­maximisation or minimisation of ATP production; maximisation of redox potential; maximisation of the rate of synthesis of a particular product, or minimisation of nutrient uptake, but the most commonly used parameter is maximisation of growth rate which has been successfully applied in many systems including nutrient limited chemostat culture of E. coli [51]. Its use is more problematic for slow growing pathogens, such as M. tuberculosis, since it has not been established that these organisms do actually maximise their growth rate. The study used two objective functions that optimised the production of mycolic acids. The first, termed C1, optimised production of only the most abundant mycolate, whereas the objective function C2 maintained the known ratios of different mycolates. To test the predictive accuracy of these ­objective functions in silico deletions were performed and compared to transposon site hybridisation (TraSH) mutagenesis data. The highest predictive accuracy was obtained with the objective function C2 with an 82% correlation with experimental data. FBA identified 16 essential genes in this study and this primary list was then filtered to remove any genes encoding proteins that were complemented by ­homologues and also those with close homologues in the human proteome. This feasibility ­analysis identified seven potential drugs targets for anti-TB drug design (discussed below).

Although targeting a small sub-system such as mycolic acid synthesis can yield valuable information on specific pathways, it has limited value in elucidating the metabolic capability of M. tuberculosis. This latter objective is best approached by constructing a genome-scale network of metabolism [12, 52]. The first published genome-scale network was built using Streptomyces coelicolor as a starting model [12]. Orthology relationships were mapped between the related species using the KEGG databases and this preliminary model was further supplemented with data from the BioCyc database. This automatic process, however, accounted for only 57% of the final model. The remaining model was reconstructed by labour intensive manual curation based upon primary research publications, textbooks and review articles, and also by picking the brains of experts in the field. The final model utilised two biomass formulations which were derived from published data of cell ­composition obtained from a variety of sources, including our own chemostat-derived data obtained from fast and slow-growing BCG. BIOMASS 1 reflects the actual ­macromolecular composition of in vitro-grown M. tuberculosis, whereas BIOMASSe consisted of only those cellular components, such as DNA, RNA, protein, co-factors and the cell wall skeleton, that were considered to be essential for in vitro growth. The advantage of having these two biomass formulations is that the model could be used to predict gene essentiality both in vitro (with the minimal BIOMASSe as the objective function) and in vivo (with the more complete BIOMASS 1 including many virulence factors as the objective function).

The final functional genome-scale metabolic network of M. tuberculosis (GSMN-TB) consisted of 739 metabolites participating in 849 reactions and involves 726 genes. The model is freely available as both an excel file or in sbml format, and is accessible via a user-friendly web tool for constraint-based simulations (http://sysbio.sbs.surrey.ac.uk/tb/). FBA-based predictions of in vitro gene essentiality using BIOMASSe as the objective function correlated well with predictions of gene ­essentiality obtained by global transposon mutagenesis [53], with an overall ­predictive accuracy of 78% [12]. Quantitative validation of the model was also performed using data from continuous culture chemostat experiments [29]. The model predicted a lower rate of glycerol consumption than the experimentally determined values. A plausible explanation for the discrepancy was that, in addition to consumption of glycerol, the tubercle bacillus also utilised oleic acid released from hydrolysis of the Tween 80 dispersal agent present in the media. Opening an additional oleic acid transport flux corrected this discrepancy and we have recently confirmed that Tween 80 is indeed being consumed in these experiments [13]. A second genome-scale reconstruction of M. tuberculosis, iNj661, was published by Jamshidi and Palsson [52], as described in Chap. 1.

The mycolic acid synthesis sub-model and the two genome-scale network ­reconstructions available for M. tuberculosis illustrate the different approaches which can be applied to reconstructing, validating and applying metabolic models. They also provide a reference for future metabolic reconstructions. The next ­challenge is to combine these three models and build upon them by integrating any new experimental data in order to expand and refine the reconstructions in an ­iterative cycle. In this way the model can serve as an up-to-date representation of the cumulative knowledge of M. tuberculosis’s metabolic capabilities. For ­comparison, the E. coli genome-scale network has undergone six different ­successive reconstructions over the last 18 years, each one contributing positively to a large number of different studies [54]. A well-curated reconstruction is a perquisite for all systems biology approaches to studying M. tuberculosis.

6 Metabolic Models of Host–Pathogen Systems

M. tuberculosis is an intracellular pathogen that replicates primarily in the phagosome compartment of macrophages so its biology is intimately connected to that of its host cell. To simulate the combined system, Bordbar et al. [57] built a novel metabolic model that integrated the iNj66I model of M. tuberculosis with a cell-specific alveolar macrophage model, iAB-AMØ-1410 (based on the global human metabolic r­econstruction, Recon 1 [37]) to build an integrated host–pathogen genome-scale reconstruction, iAB-AMØ-1410-Mt-661 (Chap. 1). The combined model was ­essentially composed of three compartments representing the macrophage, the ­phagosome and the pathogen residing within the phagosome. These were connected via metabolite and gas exchange reactions that allowed the M. tuberculosis ­compartment to uptake substrates and excrete waste products into the phagosome compartment (Fig. 4.2). The exchanges do of course instantiate several assumptions regarding the infectious state. The macrophage was assumed to be consuming glucose, glutamine and essential amino acids and excreting lactate. The phagosome environment that provided resources for M. tuberculosis replication was assumed to be depleted in glucose and rich in glycerol and fatty acids. A key aspect of the reconstruction was the biomass composition of both macrophage and M. tuberculosis compartments, ­remembering of course that biomass composition plays a key role in FBA and is often used as the objective function and thereby has a very substantial influence on the flux solutions. Macrophages do not readily multiply so the iAB-AMØ-1410 biomass reflected only maintenance function, such as lipid, protein, mRNA turnover, DNA repair and ATP maintenance. With this objective function the macrophage model successfully predicted experimentally observed rates of glucose oxidation and ­lactate production. To provide the biomass equation for intracellular M. tuberculosis the authors examined gene expression data derived from in vivo mouse model ­studies as well as in vitro studies that aimed to mimic the infection environment. They then adjusted the M. tuberculosis biomass composition to optimise the fit to the gene expression data. This involved increasing the amount of amino acids, mycolic acids, mycobactin, mycocerosates and sugars in the biomass equation; reducing ATP maintenance, DNA, fatty acids and phospholipids and removing ­peptidoglycans and phenolic glycolipids entirely from the biomass equation to ­construct a new objective function. It should be emphasised that the resulting behaviour of the reconstituted model is dependent on the precise composition of this adjusted biomass.

Fig. 4.2
figure 00042

Results obtained by integration of the alveolar macrophage (iAB-AMØ-1410) and M. tuberculosis (iNJ661) reconstructions. (a) Metabolic links between the extracellular space (e), alveolar macrophage (am), phagosome (ph) and M. tuberculosis (Mtb) in iAB-AMØ-1410-Mt-661. The model is compartmentalised using the abbreviations as shown. In the model, the major carbon sources of the alveolar macrophage were glucose and glutamine. The macrophage compartment was also aerobic and requires the essential amino acids. Despite its use of oxygen, the macrophage exhibits anaerobic respiration and produces excess lactate. In the M. tuberculosis compartment of the model, the major carbon sources available in the phagosome environment were glycerol and fatty acids. The phagosome environment was also functionally hypoxic. (b) The flux span of iAB-AMØ-1410-Mt-661 was significantly reduced (51%) compared with its progenitor macrophage model, iAB-AMØ-1410. This shows a stricter definition of the alveolar macrophage solution space without adding additional constraints on the alveolar macrophage portion of the network. (c) Reaction, metabolite and gene properties of the three reconstructions. Maximum production rates of ATP, nitric oxide, redox potential (NADH) and biomass are shown. From [57]

The authors then examined changes in flux state of the M. tuberculosis ­compartment as a consequence of its simulated replication in the macrophage (compared to the iNj66I model). The simulation predicted a shift in carbon uptake with suppression of glycolysis and up-regulation of gluconeogenesis, together with production of ­acetyl-CoA coming from macrophage-derived fatty acids via the glyoxylate shunt. Concomitant with the utilisation of fatty acids as carbon source was up-regulation of fatty acid oxidation pathways. There was a shift toward mycobactin and mycolic acid synthesis with reduced flux through nucleotide, peptidoglycan and phenolic glycolipid pathways. Many of these changes are likely to be a consequence of the altered biomass composition.

The accuracy of the model was tested by comparison of gene essentiality predictions of the M. tuberculosis component of the model with genes identified to be conditionally essential for infection in a mouse lung model (but not essential in vitro) by TraSH [55]. A total of 374 genes investigated by TraSH were in the model. Of these, the in silico analysis predicted that only 9 genes were conditionally essential in the M. ­tuberculosis compartment of the integrated model. Of those nine in silico-predicted essential genes, only two genes were also essential experimentally by TraSH. Many of the discrepant results are likely to be due to differences between the simulated ­macrophage system and the mouse model that was used to obtain the TraSH data. Four of the nine genes were components of nitrate reductase, which has been shown to play a role in in vitro models of infection [56] and was thereby incorporated into the model when the gene expression data was used in the fitting exercise. However, even discrepant model predictions can be informative. Systems biology models are essentially a mathematical instantiation of biological hypotheses. In this case, one of the hypotheses embedded in the iAB-AMØ-1410-Mt-661 model was that nitrate reductase was required for survival of M. tuberculosis in the mouse. This hypothesis was tested by comparison of model predictions with TraSH data [55] and shown to be discrepant [57]. However, the situation is of course more complicated as nitrate reductase, although apparently non-essential in the mouse lung model [55], has been shown to contribute to the virulence of BCG in immunodeficient mice [58]. Therefore, alternative models will have to be constructed for different host: pathogen combinations. Testing in silico predictions experimentally is an essential step in the refinement and improvement of systems biology models in an iterative cycle of model  →  prediction  →  experimental test  →  model.

6.1 Applications of the Models

6.1.1 Using Models to Interrogate Genome Annotation

Genome-scale networks are usually constructed initially from genome annotation and are thereby subject to errors in that annotation. However, the metabolic model scrutinises the metabolic component of genome annotation at a system level for functionality and can thereby be used to find pathway holes or inconsistencies in the annotation. There are several “orphan reactions” in GSMN-TB, that is, reactions that are required for network functionality but for which there is no annotated M. tuberculosis gene predicted to perform that function. For example, sulfolipid ­synthesis in M. tuberculosis generates the metabolite adenosine 3′,5′-bisphosphate (PAP in the model) which will accumulate and thereby become toxic (unbalanced in the model) if it is not catabolised. The model is therefore infeasible unless the reaction catalysed by the enzyme 3′,5′-bisphosphate nucleotidase (which converts the metabolite to AMP and inorganic phosphate) is included in the network, as an orphan reaction. Examining model feasibility thereby generates clues to incomplete or incorrect genome annotation and may even provide novel drug targets that are not apparent in the genome annotation.

In silico models also allow genome annotation to be scrutinised by systems-based experimental data. For example, the route for glycerol utilisation is generally assumed to proceed via glycerol kinase (encoded by glpK) followed by dehydrogenation; however, the genome annotation of M. tuberculosis includes several alcohol dehydrogenases that could be involved in an alternative uptake pathway whereby glycerol is first oxidised by glycerol dehydrogenase before being phosphorylated (this pathway is annotated in the KEGG M. tuberculosis pathway map). However, incorporation of this pathway into the initial GSMN-TB model led to the prediction that the gene glpK is dispensable for growth on media with glycerol as sole carbon source. Global mutagenesis data ­demonstrated that glpK was in fact essential for growth on glycerol, which was confirmed by construction of a single glpK knock-out mutant [30]. This information was ­incorporated into a refined GSMN-TB model in which the annotated alcohol ­dehydrogenases do not provide an alternative glycerol uptake pathway.

Other systems-based insights into the metabolism of M. tuberculosis can be obtained by simply performing growth simulations with the model. For instance, it is often claimed that M. tuberculosis requires operation of the glyoxylate shunt for growth on lipids. However, FBA-based simulation of GSMN-TB indicated that although the isocitrate lyase reaction of the glyoxylate shunt is predicted to be ­essential for growth on simple fatty acids such as acetate, it was not predicted to be essential for growth on complex lipids, such as phospholipids. The reason is ­apparent on ­examination of the flux solution: catabolism of phosopholipids yields glycerol as well as acetate, which can be used for gluconeogenesis without operation of the shunt.

6.1.2 Interpretation of Experimental Data

The functional genomics revolution has provided the means to generate ­high-throughput datasets but integration and interpretation of vast numbers of data points to generate new hypotheses remains a formidable challenge. Computational models can serve to bridge the gap between data and hypothesis driven research providing a framework for integration of high-throughput data that can lead to model revisions (to resolve discrepancies between model predictions and ­experiment) and semi-automatic generation of new hypotheses [59, 60]. Even simply overlaying “omic” data onto genome-scale metabolic models provides a metabolic context to interpret this data and can also highlight incomplete or incorrect knowledge [59].

6.1.2.1 Gene Essentiality Data

One of the most straightforward applications of genome-scale models is to predict essential genes that can then be compared to experimental data. For example, using a high-throughput TraSH screen we identified the genes essential for the growth of M. bovis BCG on glycerol and compared this with gene essentiality predictions using GSMN-TB [30]. Whilst there was a good correlation between the GSMN-TB and the experimentally observed gene essentialities (76.66%) the analysis demonstrated how the model could be used to highlight gaps in our knowledge of TB’s ­metabolism. Some of the discrepancies can be attributed to an undefined level of inaccuracy in global mutagenesis assays but may also be due to gene regulation of isoenzymes. For instance, both menaquinol oxidase systems (the aa3-type and bd-type) are predicted to be non-essential as they are functionally redundant in the model. This contradicts the global mutagenesis data, which indicated that the aa3-type cytochrome c ­oxidase is in fact essential and likely to be the main electron transport system operating under aerobic conditions.

6.1.2.2 Transcriptome Data

Whereas it is relatively simple to obtain multiple measurements to define the ­physiological or metabolic state of bacteria in vitro, only limited information can be obtained for bacteria in vivo. In particular, it is very challenging to perform ­metabolomic, proteomic, biochemical, physiological or structural studies with the small numbers of organisms obtained from infected animals. However, it is possible to perform transcriptomic studies on in or ex vivo grown organisms and these ­methods have been applied to the TB bacillus to obtain transcriptomic profiles of bacteria growing in cultured macrophages, mouse models and in human lesions [6165]. The transcriptional profile of a cell (via translation, enzyme activity, etc.) determines most aspects of the physiological state; therefore, it should be possible to predict a physiological state from knowledge of the complete cellular ­transcriptome. However, the mapping between messenger RNA levels and p­hysiological state is highly complex and non-linear depending on many unknown factors such as mRNA stability, translation efficiency and post-translational modification of proteins. Traditional approaches to defining metabolic responses from transcriptome data have generally relied on examining expression levels of key (rate-controlling) genes in metabolic pathways (for instance, [66]). However, metabolic control analysis has demonstrated that control is distributed throughout the entire metabolic network, such that the flux through any particular pathway is controlled globally [67, 68] rather than by a particular enzymatic step. This makes a simple mapping of differentially expressed genes onto metabolic pathways an unrealistic strategy for successful predictions of global metabolic state changes.

Several system-level approaches have been proposed to extract metabolic ­information from gene expression profiles. In the reporter metabolites approach [69], the local connectivity of a metabolite in the bi-partite, substance/reaction graph is used to identify a set of genes associated with each metabolite. Subsequently, for each of the metabolites, the distribution of the microarray-derived signal of genes associated with the metabolite is compared with the background distribution of the microarray-derived signal for all genes, resulting in the identification of the transcription regulation focal points of metabolism: network nodes that are directly affected by clusters of ­differentially expressed genes. In another approach, Shlomi [70] used Mixed Integer Linear ­programming to minimise the discrepancy between the internal metabolic flux ­distribution and the transcriptional profile of genes encoding metabolic enzymes. Their approach identifies flux distributions, which are consistent with the stoichiometric ­constraints of the genome-scale metabolic ­reaction network and at the same time ­maximise the number of active metabolic fluxes associated with up-regulated genes and the number of non-active metabolic fluxes associated with down-regulated genes. Yet another approach, E-flux, was recently developed and used to examine M. t­uberculosis microarray data in the ­context of both the genome-scale metabolic ­reaction network, by constraining upper bounds of metabolic reactions to values proportional to the microarray signals of genes associated with these reactions [71]. Two models were used for the analysis: the Raman et al. model of mycolic acid pathways [50] and the GSMN-TB ­genome-scale metabolic model [12]. E-flux was applied to microarray data obtained from a large study that investigated the response of M. tuberculosis to 75 ­different drugs, drug combinations and nutrient conditions [72]. Eight of the tested drugs target mycolic synthesis, and this was correctly predicted by E-flux analysis of the microarray data, indicating that the method may be useful for target identification of novel inhibitors.

Shi et al. [73] applied a similar in silico method to interrogate quantitative PCR (qPCR) transcriptome data obtained from a model of respiratory infection of mice in which M. tuberculosis replicates in the lung for approximately 20 days, followed by stabilisation of bacterial numbers due to expression of acquired cell-mediated immunity. The qPCR data was not genome scale but focused on a set of genes ­predominantly involved in the pathways of central metabolism and lipid synthesis. This data was first interpreted qualitatively. Observed changes in mRNA abundance suggested that as tubercle bacilli stop replicating in the mouse lung and respond to the decreased demand for energy and biosynthetic precursors by down-regulating glycolysis, PPP and the TCA cycle. The main function of central metabolism appears to shift from providing energy and biosynthetic precursors for bacterial growth to accumulating the storage compounds such as triacyl glycerides (TAG) and ­glutamate. To gain a genome-scale insight into the underlying metabolic changes, two in silico cells were constructed by adjusting biomass composition of the GSMN-TB model. One cell represented growing M. tuberculosis, while the other represented the more minimal cell composition predicted for non-growing M. tuberculosis in the mouse lung. FBA was then used to predict the consequent changes in flux distribution in the cell. The resulting flux distributions were broadly consistent with the gene expression data and the hypothesis that growth arrest in the mouse lung is associated with a re-routing of carbon flow in central metabolism from metabolic pathways generating energy and biosynthetic precursors to pathways for storage compounds, such as TAG and glutamate.

Our own laboratory developed an alternative method, differential producibility analysis (DPA). The method [74] relies on FBA to link genes with metabolites on a system-wide level. A gene essentiality scan is first performed on every gene but instead of using biomass as the objective function, each metabolite is, in turn, used as the objective function. It is thereby possible to generate a mapping that identifies, for each metabolite, the genes that are required for its synthesis (the producibility plot). In the next step of DPA, the experimental data is interrogated. Gene expression signals for a particular experiment are assigned onto each gene which, using the producibility plot, are mapped onto each metabolite. Metabolites are then ranked to identify the metabolites that are most affected by genes that are up-regulated in the target experiment and (separately) are ranked to metabolites that are most affected by genes that are down-regulated in the same experiment. The whole procedure of DPA effectively transforms a gene-based transcriptome signal into a metabolite-based metabolome signal.

The DPA method was first tested with E. coli transcriptome data and shown to successfully identify metabolic responses to environmental perturbation (shift to anaerobic growth) and gene knock-out. This method was then applied to several M. tuberculosis in vitro transcriptome datasets and was able to identify metabolic responses. Applying DPA to transcriptomic data obtained from M. tuberculosis replicating in mice-derived macrophage [63] revealed a previously unrecognised feature of the response of M. tuberculosis to the macrophage environment [74]: a down-regulation of genes influencing metabolites in central metabolism and ­concomitant up-regulation of genes that influence synthesis of cell wall components and virulence factors (Fig. 4.3). DPA suggests that a significant feature of the response of the tubercle bacillus to the intracellular environment is a channelling of resources towards remodelling of its cell envelope, possibly in preparation for attack by host defences. Interestingly, application of DPA [74] to transcriptome data obtained from M. tuberculosis bacilli recovered from human sputum [75] generated a very different metabolic signature to the mouse macrophage data. DPA can thereby be used to unravel the mechanisms of virulence and persistence of M. tuberculosis and other pathogens and may have general application for extracting metabolic signals from other “-omics” data.

Fig. 4.3
figure 00043

Pi chart illustrating the role of M. tuberculosis metabolites in macrophages. Pi chart illustrating the role of metabolites associated by DPA with up-regulated (a) or down-regulated, (b) genes in the mouse macrophage [63]

6.1.2.3 Stable Isotope Metabolite Profiling

Seminal studies performed many decades ago, mostly in E. coli, established the major pathways for carbon substrate utilisation in bacteria through metabolite tracer analysis. More recently, stable isotope studies are being increasingly used to monitor metabolism. The usual approach is to feed the microbe a 13C-labelled carbon substrate (uniformly and/or positionally labelled) and then measure the labelling profile using nuclear magnetic resonance (NMR) and/or Mass Spectrometry (MS). NMR can provide positional information but it is less sensitive than MS and, ­crucially, it is soften difficult to identify metabolites responsible for a particular NMR signals. MS is being increasingly used for metabolite profiling since it combines high mass accuracy (providing accurate metabolite identification) with high sensitivity. However, it measures only mass so (unlike NMR) it does not distinguish between isotopomers labelled at different positions but with the same mass.

Stable isotope profiling has been applied in a several studies of M. tuberculosis metabolism [7678]. For example, 13C labelled substrates were used to demonstrate that phosphoenolpyruvate carboxykinase (PCK) predominantly catalyses the ­conversion of oxaloacetate to phosphoenolypyruvate (PEP) when M. tuberculosis is growing on acetate [78]. Cavalho et al. [76] used 13C labelled substrates to confirm that M. tuberculosis is able to co-catabolise multiple substrates simultaneously and in the same study demonstrated that a form of compartmentalised metabolism was occurring whereby individual substrates had defined metabolic fates. More recently, stable isotopes were used in a study investigating M. tuberculosis metabolism of 13C labelled glucose, aspartate and CO2 under anaerobic conditions [18]. By monitoring the 13C labelling profile of secreted succinate from these cultures the authors ­demonstrated that M. tuberculosis is able to operate a TCA cycle in the reductive direction under anaerobic conditions and that this pathway drives succinate ­secretion (see also 4.2).

6.1.2.4 13C Metabolic Flux Analysis

The goal of metabolic analysis is to understand the metabolic pathways that are being utilised under particular conditions. As described earlier, it is possible to obtain estimates of the ranges of fluxes that are compatible with substrate inputs and outputs in a system using FBA. However, there are usually a great number of­ possible flux solutions that are compatible with the data so FBA utilises the method of optimisation to determine the flux distribution that optimises some parameters, such as growth rate. This method has been very successfully applied in fast-growing organisms but its application to the slow-growing pathogen, M. tuberculosis is uncertain. A more direct means of establishing the intracellular fluxes is through 13C-MFA. This powerful technique has been successfully applied to identify f­unctional flux states in various microbes ([79] provides a recent review of the ­technique and its application) and has enormous potential for studying the ­metabolism of M. tuberculosis.

During a 13C-MFA experiment an organism in metabolic steady state (usually cultivated in a chemostat) is grown in the presence of 13C-labelled carbon substrate (uniformly and/or positionally labelled). For isotopic steady-state experiments, mixtures of unlabelled and 13C-labelled substrates are used as otherwise all the metabolites would eventually become labelled to the same degree as the substrate and the experiment will be uninformative. The positional labelling patterns (which carbon atoms are labelled) of the amino acids and/or metabolites (as determined by either MS and/or NMR) are then used as additional constraints in MFA to solve the internal fluxes and thereby reconstruct the paths through central metabolism that the carbon took inside the cells.

Our earlier work used FBA to predict flux distribution in fast and slow-growing M. tuberculosis with glycerol as sole carbon source and predicted an increased flux through the glyoxylate shunt at slow growth rate [12]. This was surprising as the shunt is usually considered to be used solely for growth on two carbon compounds such as acetate and the essentiality of this enzyme for in vivo growth has been widely interpreted to indicate that the pathogen consumes host lipids (yielding ­acetate on beta oxidation) during growth inside the macrophage [1, 2, 80]. We confirmed that activity of the enzyme isocitrate lyase (ICL, the key enzyme of the glyoxylate shunt) was indeed induced during slow growth rate but how and why the organism was utilising the glyoxylate shunt during slow growth on glycerol was a mystery. The mystery was compounded when we constructed an ICL mutant of M. tuberculosis and demonstrated that the mutant was unable to grow at slow growth rate in a glycerol limited chemostat [13]. To discover the metabolic pathways involved in slow (and fast) growth we recently, for the first time, performed 13C-MFA on M. bovis BCG and M. tuberculosis at fast and slow growth rate. Tracer experiments were ­performed with ­steady-state chemostat cultures supplied with 13C labelled glycerol. Through ­measurements of the 13C isotopomer labelling patterns in protein-derived amino acids and enzymatic activity assays we identified the activity of a novel pathway (termed the GAS pathway) that is used for pyruvate dissimulation in M. tuberculosis. This pathway is characterised by significant flux through the glyoxylate shunt and also through the carbon fixing ­anaplerotic reactions at the PEP-pyruvate-oxaloacetate node combined with very low flux through the succinate–oxaloacetate segment of the TCA cycle (Fig. 4.4). The flux through the GAS pathway is increased at slow growth rate accounting for the essentiality of ICL at slow growth rate. An interesting feature of the GAS pathway is that it included a significant fraction of flux (far more than required for anaplerosis) going through one or more of the anaplerotic reactions between ­phosphoenolpyruvate/pyruvate and malate/oxaloacetate in the carbon-fixing direction. This prediction of 13C-MFA was confirmed by feeding M. tuberculosis 13C-labelled sodium bicarbonate and confirming that the pathogen is indeed able to incorporate this unusual carbon source into amino acids [13]. As the human host is abundant in CO2 this finding and the operation of the GAS pathway requires further ­investigation in vivo as carbon dioxide fixation may provide a point of vulnerability that could be targeted with novel drugs.

Fig. 4.4
figure 00044

Schematic of the GAS pathway which is characterised by flux through the glyoxylate shunt and anaplerotic reactions for oxidation of pyruvate and succinyl CoA synthetase for the generation of succinyl CoA. Metabolite abbreviations: PEP/PYR phosphoenolpyruvate/pyruvate, Ac acetate, CHO, ICIT isocitrate/citrate, MALOAA L-malate-oxaloacetate, OXG 2-oxoglutarate, SUC succinate, SUCCOA succinyl-CoA, GLX glyoxylate, OXG 2-oxoglutarate

7 Future Challenges

The application of systems biology to the study of TB is a science that is still in its infancy. Nevertheless, significant progress has already been made. Several in silico models of M. tuberculosis have been constructed and a reconstruction of the M. tuberculosis-macrophage system has been described. The model building process itself is a highly informative exercise that not only defines a minimal metabolic capacity ­necessary for making a M. tuberculosis cell but also provides clues to gene annotation and generates novel insights into the metabolic capability of this ­pathogen. The models have been shown to be useful tools for drug target prediction. One of the most powerful applications of these approaches has been to use the in silico models to interrogate experimental data to provide system-level insight into underlying metabolic responses associated with the response of M. tuberculosis to stress, drugs and growth in host cells. Each of these models is available online ­giving researchers across the world access to systems biology tools that can be used to investigate the biology of the tubercle bacillus but also to interrogate both ­published and new datasets.

There is no question that existing models are not yet realistic reconstructions of the M. tuberculosis cell. However, interrogating the models with experimental data in an iterative cycle of model refinements and experimentation will ensure that the current models will become more accurate descriptions of the metabolism of M. tuberculosis. A limitation of current FBA-type models is that they can only strictly be applied to steady-state systems. A future goal will be to extend these models to simulate dynamic states, such as during in vivo growth. This will require integration of metabolic networks with gene regulatory networks and kinetic models of enzyme action together with realistic models of host–pathogen interactions. Such multi-scale models may eventually be used to build an in silico M. tuberculosis cell. Such a model may be used for drug discovery and optimisation of treatment regimes but will also be an enormously powerful tool to investigate the fundamental biology of this important pathogen.