Introduction

A genome-scale metabolic network model (GEMs) is a mathematical formulation that summarizes all data about genes, proteins, and reactions known to be involved in the metabolism of a specific cell. Using reliable metabolic models, one can perform virtual (in silico) experiments in a rapid and inexpensive manner (Fouladiha and Marashi 2017; Gu et al. 2019). Therefore, GEMs can be helpful tools in cell biology and metabolic engineering by predicting the metabolic state of cells under certain growth conditions (Zhang and Hua 2016).

Chinese hamster ovary (CHO) cells are the main workhorse in the biopharmaceutical industry for producing recombinant proteins, such as humanized monoclonal antibodies. These cells were originally obtained from a Chinese hamster (Cricetulus griseus) in 1957. Several studies have focused on the optimization of the production of CHO cells using cellular and metabolic engineering methods (Wells and Robinson 2017). Experimental manipulation and maintenance of CHO cells, like many other mammalian cell lines, are costly and time-consuming. A reliable metabolic model of CHO cells can be used as a platform to perform computational analyses of cell metabolism to aid in experimental design. Such a model-driven analysis may predict the outcome of experimental tests and reduce the possibility of having false experimental results. Moreover, a CHO metabolic model can be helpful in suggesting genetic engineering and media-design strategies for improving recombinant protein production (Calmels et al. 2019; Fouladiha et al. 2020; Traustason et al. 2019). Another appreciated application of metabolic models is their role in interpreting “omics” data (Hyduke et al. 2013; Kildegaard et al. 2013; Lakshmanan et al. 2019; Richelle et al. 2019a). For example, transcriptomic and proteomic data can be mapped onto the models to infer new knowledge about the physiological characteristics of cells (Richelle et al. 2019b; Schaub et al. 2011).

One major challenge in the development of genome-scale metabolic network models is our limited knowledge of a cell’s metabolism. Specifically, genome-scale metabolic network reconstructions must be iteratively expanded as novel data emerges on enzymes and reactions that occur in the cell of interest. For example, several updates of the GEMs of Saccharomyces cerevisia have been published (Castillo et al. 2019), from iND750 (Duarte et al. 2004) and iIN800 (Nookaew et al. 2008), to Yeast 5 (Heavner et al. 2012), and ecYeast7 (Sánchez et al. 2017). A variety of algorithms have also been developed to predict additional reactions and potential genes that could catalyze such reactions (Karlsen et al. 2018), where using machine-learning methods have been helpful (Medlock et al. 2020; Medlock and Papin 2020). These algorithms are particularly useful for expanding the metabolic networks of non-model organisms (Biggs and Papin 2017).

The previous version of the CHO model, iCHO1766, has been used in several studies. For example, iCHO1766 was used to predict the lethality of CHO genes (Ley et al. 2019), to improve the predictive power of the model by modifying flux analysis (Chen et al. 2019; Lularevic et al. 2019), to assess heterogeneity in cell culture (Fernandez-de-Cossio-Diaz and Mulet 2019), and to improve bio-production capability of CHO cell by designing cell feeds (Fouladiha et al. 2020; Schinn et al. 2020). iCHO1766 has also been a helpful tool in studying metabolism of the cells, together with fluxomics (Hong et al. 2020), transcriptomics (Zhuangrong and Seongkyu 2020), and proteomics (Zhuangrong and Seongkyu 2020). In order to have more reliable and accurate results, especially in “omics” data integration, the metabolic model needs to be regularly updated to cover the latest molecular and biochemical knowledge (Schinn et al. 2020; Yeo et al. 2020).

Here, we have conducted an in-depth gap-filling of the genome-scale metabolic network reconstruction of the Chinese hamster, iCHO1766 (Hefzi et al. 2016), and introduce iCHO2101, an updated version for enhanced genome-scale modeling of CHO cell metabolism. Compared to the previous version of the CHO model, the number of genes and reactions has been increased, and the numbers of blocked reactions and dead-end metabolites have been reduced by about 10% and 15%, respectively. In other words, more parts of the metabolic model can be active, and more reactions are able to carry fluxes in this new version. These improvements increase the accuracy and precision of the predictions made by the analysis of the metabolic model.

Methods

Analysis of iCHO1766

The COBRA toolbox (Becker et al. 2007) was used for the constraint-based analysis of the metabolic model of CHO cells (iCHO1766). Flux variability analysis (FVA) (Burgard et al. 2001) was used to find the possible bounds of every flux in steady-state conditions, with no constraints on the flux bounds. If the lower and upper bounds of a specific flux were both equal to zero, that reaction was assumed to be blocked. In the same way, if the upper and lower bound of the exchange flux of a metabolite was zero, that metabolite was considered as a “non-producible and non-consumable” or a “dead-end” metabolite.

Filling the gaps and validation of the results

In the present study, four independent approaches were used for the gap-filling of iCHO1766. The first two approaches were based on automatic gap-filling tools, namely, GapFind/GapFill (Kumar et al. 2007) and GAUGE (Hosseini and Marashi 2017). The GapFind algorithm uses mixed integer linear programming (MILP) to find all metabolites that cannot be produced in steady-state. The “root” gaps are those non-producible metabolites whose filling will unblock the other non-producible (or, “downstream”) gaps. Then, the GapFill algorithm selects a minimal subset of reactions from a universal reaction database that must be added to the model in order to convert a non-producible metabolite to a producible one.

In the second approach, we used GAUGE as our computational tool. GAUGE uses transcriptomics data to determine the inconsistencies between genes co-expression and flux coupling in a metabolic model. Then, GAUGE finds a minimal subset of reactions in the KEGG database whose addition can resolve the inconsistencies.

Reactions suggested by GapFind/GapFill and GAUGE (and their associated genes/proteins) were validated before being added to iCHO1766 as follows. If the gene ID of the new reaction or the gene ID that is attributed to the enzyme of the new reaction is found in Chinese hamster according to the KEGG database, that new reaction is confirmed. Otherwise, the validation is performed based on the results of BLASTp against the Cricetulus griseus (Chinese hamster) transcriptome, using the enzyme of the new reactions and CHO cell transcribed genomic sequences. For each enzyme, in the KEGG database, the amino acid sequences from different species were examined, and the best BLASTp hit was reported. A gene/protein was assumed to be present in Chinese hamster metabolism if a BLAST search hit is found with e-value < 1 × 10−10. To have a stricter standard, we only considered hits with query coverage > 70%, or, those hits which were of > 30% sequence similarity.

Our third gap-filling approach was based on manual assessment of the blocked reactions in iCHO1766. In several cases, the absence of an exchange or transport reaction was the cause of reaction blockage in the model. In such cases, we checked if each non-producible or non-consumable metabolite is reported in the Human Metabolome Database (HMDB) (Wishart et al. 2017). If the blocked metabolite was reported to be present in any of the human biofluids (including blood, saliva, and urine), it was assumed that the transport of the metabolite across extracellular membrane of a typical mammalian cell is possible, and therefore, an exchange reaction of that metabolite was added to the model with a high confidence score. If a metabolite was “expected” to be present in biofluids by HMDB, the exchange reaction of that metabolite was added to the model with a low confidence score.

In the fourth approach, the BiGG database (King et al. 2015) was used to retrieve all known biochemical reactions and their corresponding enzymes. Then, the KEGG database was queried to extract the full list of Chinese hamster genes and their association with biochemical enzymes. The intersection of these two lists was considered as the list of potential reactions. Then, the 1766 genes that were present in iCHO1766 were subtracted from the list of potential reactions to find those CHO reactions that have counterparts in BiGG, but are not present in iCHO1766.

Analysis of iCHO2101

The COBRA toolbox (Becker et al. 2007) was used for performing flux balance analysis (FBA) and flux variability analysis (FVA) of the updated CHO model when uptake fluxes were unconstrained/constrained. In the unconstrained state, no restrictions were applied to the flux bounds. In the constrained state, on the other hand, only the metabolites of the cell culture medium were allowed to be imported to the model, with a limited flux as defined in iCHO1766 (Hefzi et al. 2016). Here, FBA was used to predict the maximum growth rate, and FVA was used to calculate possible flux bounds of each reaction while maintaining the maximum growth rate. The reactions with non-zero flux bounds in FVA were considered as “active” reactions.

Gene expression analysis

In order to evaluate the new version of the model and compare it with iCHO1766, the transcriptomic data of CHO cells were used. These normalized data include expression levels of more than 23,000 genes of CHO-S and CHO-K1 cells across 191 different samples, including published data (Hefzi et al. 2016; Van Wijk et al. 2017) and unpublished data sets. Data were processed as follows: FastQC v11.1 (Andrews 2010) was used to assess read quality. Trimmomatic v0.33 (Bolger et al. 2014) was used to trim reads with adapters or low-quality scores. STAR2.4.2a (Dobin et al. 2013) was used to align trimmed reads to the CHO-K1 genome (Xu et al. 2011), followed by calculating fpkm using cufflinks (v2.2.1).

To represent the expression of each gene, the average expression was computed across all 191 samples. The expression of a single-gene reaction was assumed to be proportional to its gene expression. In case of reactions associated with multiple genes, we restricted our analysis to those reactions whose genes were linked either with “OR” or “AND”. If all genes of a reaction were linked by “OR”/”AND”, the maximum/minimum amounts of gene expressions were attributed to that reaction. Then, we assessed expressions of the reactions in a pathway and compared it with the percentage of blocked reactions in that pathway.

Results

A quarter of reactions in iCHO1766 are blocked

The community-consensus genome-scale metabolic models of CHO cells, iCHO1766, includes 1766 genes, 6663 reactions, and 4455 metabolites. Using constraint-based modeling (see Methods), one can observe that about 23% of the reactions (1503 reactions out of the total 6663 reactions) of iCHO1766 are blocked. These blocked reactions cannot carry a non-zero flux in steady-state conditions. The reactions of iCHO1766 are categorized in 125 metabolic pathways, of which 83 pathways include ten or more reactions. Among these, there are 16 pathways in which at least 50% of the reactions are blocked (Table 1). The distribution of blocked reactions in all metabolic pathways has been shown in Supplementary Table 1. In addition, about 21% of the metabolites (955 metabolites out of total 4455 metabolites) in iCHO1766 are “dead-end” metabolites, i.e., they cannot be produced nor consumed in steady-state. These metabolites belong to different subcellular parts of the model (Table 2).

Table 1 A list of metabolic pathways of iCHO1766 that more than 50 percent of the metabolic reactions in that pathway is blocked
Table 2 The distribution of dead-end metabolites of iCHO1766 in each subcellular part

These blocked reactions and dead-end metabolites suggest that iCHO1766 includes metabolic gaps (Orth and Palsson 2010), which is common in genome-scale metabolic models. Other gaps may also exist in the model, all of which may result in the inconsistencies between model predictions and experimental results. In other words, gaps may decrease the reliability of phenotypic predictions of a metabolic model. Several gap-filling methods have been designed to find the gaps and predict the ways of removing them from the model (Pan and Reed 2018). The majority of these methods use a comprehensive dataset of all known biochemical reactions, which is often obtained from the KEGG database (Kanehisa et al. 2016). These methods try to find a subset of reactions to be added to the model to fill the gaps and improve model predictions. Gap-filling methods can be classified into three groups. The first group consists of solely-computational methods, which use different computational algorithms and linear or mixed integer linear programming (MILP) to fill the gaps of a model. GapFind/GapFill (Kumar et al. 2007), BNICE (Hatzimanikatis et al. 2005), FBA-Gap (Brooks et al. 2012), MetaFlux (Latendresse et al. 2012), FastGapFill (Thiele et al. 2014), and FastGapFilling (Thiele et al. 2014) are some examples of the first group of methods. The second group of gap-filling methods is phenotype-based methods. These methods take advantage of phenotypic data of the cells, such as viability on different carbon or nitrogen sources, to acquire new data regarding the biochemical reactions of the cell and fill the gaps of the metabolic model of the cell. Smiley (Reed et al. 2006), GrowMatch (Kumar and Maranas 2009), OMNI (Herrgård et al. 2006), and MinimalExtension (Christian et al. 2009) belong to the second group. All methods that use various kinds of “omics” data to fill the gaps of a metabolic model are in the third group, e.g., Sequence-based (Krumholz and Libourel 2015) and Likelihood-based (Benedict et al. 2014) methods, Mirage (Vitkin and Shlomi 2012), and GAUGE (Hosseini and Marashi 2017).

In the present study, we decided to use GapFind/GapFill and GAUGE methods to fill the gaps of iCHO1766. The results of these two methods were manually validated and added to the model. Besides, two manual gap-filling approaches have been used (see Methods). In the end, representing statistics of the new model and mapping gene expression data will indicate significant improvements in CHO metabolic model.

Gap filling approaches

Two automatic approaches, namely, GapFind/GapFill and GAUGE, and two manual approaches, were used to fill the gaps of iCHO1766. The GapFill method suggested the addition of 121 reactions to the model in order to enable 123 metabolites to be producible (listed in Supplementary Table 2). Some of these 121 reactions can make more than one metabolite to be producible. We validated the predictions by manually searching the KEGG database and also using BLASTp. For example, 4-coumarate (C00811) was a ‘root’ gap in iCHO1766 (a non-producible metabolite in steady-state). In addition, caffeate (C01197) can only be produced from 4-coumarate, and therefore, caffeate was a ‘downstream’ gap. A reaction (R00737), which is catalyzed by tyrosine ammonia-lyase, can fill both of the aforementioned gaps by transforming tyrosine to 4-coumarate and ammonia. The possibility of tyrosine ammonia-lyase expression in CHO cells was approved using the BLASTp method and therefore, R00737 was added to the model. In total, the addition of 56 reactions was validated, which enabled 87 metabolites to be producible in iCHO1766 (Table 3). These new reactions were associated with 30 new genes, which were added to the latest version of the model.

Table 3 New validated reactions predicted by using the GapFill method to be added to the model.

Using the GAUGE method, the inconsistencies between gene co-expression and flux coupling relation of 146 gene pairs were found. GAUGE also suggested solutions for removing the inconsistencies of 64 pairs of them (listed in Supplementary Table 3). Only 37 out of 64 pairs had validated reactions as solutions. In total, 29 reactions were added to iCHO1766 using the GAUGE method (Table 4). These new reactions were associated with 3 new genes, which were added to the new version of the model.

Table 4 New validated reactions predicted by using the GAUGE method to be added to the model

In the third gap-filling approach, all non-producible and non-consumable metabolites were searched in the HMDB database, and the equivalent IDs were retrieved. If any of the metabolites were detected in human biofluids, the exchange reaction of that metabolite was added to the model with a high level of confidence. This approach added 257 new reactions to the model (a full list of reactions and HMDB IDs are available in Supplementary Table 4). For example, nonanoate was a dead-end metabolite, which was detected in blood, feces, saliva, and sweat (HMDB0000847). The extracellular export of nonanoate enabled a blocked reaction to carry flux in the linoleate metabolic pathway. There was another group of metabolites that were labeled as “expected to be detected in human biofluids” by HMDB. The exchange reactions of 196 metabolites of this group were added to the model with a low level of confidence (a full list of reactions and HMDB IDs are available in Supplementary Table 5).

With a manual assessment of the blocked reactions in iCHO1766, we found that there was a lot of repetition of reactions in different subcellular compartments of the model. In other words, these reactions have the same reactants and products, with precisely the same stoichiometric coefficients, but in different subcellular compartments. In such cases, the absence of appropriate transport reactions caused a lot of blocked reactions. There were 178 blocked repetitive reactions in the iCHO1766, which have no genes, which we therefore suggest for deletion in future curation efforts (all such reactions are listed in Supplementary Table 6). Furthermore, if there was a transport reaction for a metabolite in a subcellular part with no genes in iCHO1766, the addition of another transport reaction for that metabolite between other subcellular parts of the new version of the model had a high confidence score. These 139 reactions were added to the new model (Supplementary Table 7).

We found 314 new genes in the fourth approach by searching the BiGG and KEGG databases (see Supplementary Table 8). Twelve of these 314 new genes were also predicted by GapFind/GapFill, and 1 out of 314 new genes was also predicted by GAUGE. The addition of these new genes updated the gene association data of 30 reactions of iCHO1766 and also caused 42 new reactions to be added to the new model.

Analysis of iCHO2101

Using the four mentioned gap-filling approaches, a total number of 773 new reactions, 335 new genes, and 72 metabolites were added to iCHO1766. In addition, we reviewed the names of metabolites and reactions of the model and renamed the unknown IDs based on BiGG database. The new version of iCHO1766, which is named iCHO2101, has 2101 genes, 7436 reactions, and 4527 metabolites (see Supplementary Table 10). In iCHO2101, 58 pathways contain no blocked reactions, and only 5 pathways have more than 50% blocked reactions (Table 5). In addition, the distribution of dead-end metabolites of iCHO2101 in different subcellular compartments has been reduced to less than 10% (Table 6). Figure 1 summarizes the improvements made in the current study for the metabolic model of CHO cells by creating a visual comparison of model statistics, blocked reactions, and dead-end metabolites between iCHO1766 and iCHO2101.

Fig. 1
figure 1

A visual comparison between the statistics of iCHO1766 and iCHO2101. Part a (upper left) shows the number of genes, reactions, and metabolites. Part b (lower left) shows the distribution of dead-end metabolites in different subcellular parts. Part c (right) shows the percent of blockage in the selected pathways reported in Table 1

Table 5 A list of metabolic pathways of iCHO2101 that more than 50 percent of the metabolic reactions in that pathway is blocked
Table 6 The distribution of dead-end metabolites of iCHO2101 in each subcellular part

Using FBA after applying our published uptake and secretion constraints, we found the maximum growth rate in the constrained state was similar for iCHO1766 and iCHO2101 (0.03 h−1). By performing FVA in the constrained state of iCHO1766 and iCHO2101, we found the number of “active” reactions in each metabolic pathway had been significantly improved in the gap-filled version of the model. Figure 2 shows the percent of activities of fluxes in 14 metabolic pathways with more than 5 reactions, where the changes between iCHO1766 and iCHO2101 are more than 30%. For example, all reactions of ‘sphingolipid metabolism’ are “active” in modeling the growth using iCHO2101, thus enabling the analysis of this process, which has been previously reported to be of importance for the growth of CHO cells (Hanada et al. 1992).

Fig. 2
figure 2

A visual comparison of flux activities in 14 metabolic pathways of iCHO1766 and iCHO2101, where the changes are more than 30% in comparison

Gene expression analysis

We subsequently analyzed the expression of the genes in the metabolic models in 191 RNA-Seq samples. We computed the expression levels of reactions (see Methods). Then, considering the expressions of reactions in the metabolic pathways of the iCHO1766, it was revealed that some of the pathways with a high level of expression had a high percent of blockage. For example, ‘androgen and estrogen synthesis and metabolism’ had the highest level of expression among blocked pathways, where 98% of the reactions were blocked. In the new model, only 56% of the reactions in the mentioned pathway are still blocked. In another example, ‘glyoxylate and dicarboxylate metabolism,’ ‘methionine and cysteine metabolism,’ and ‘galactose metabolism’ are among the top ten highly expressed pathways, while about 30% of the reactions are blocked in the pathways in iCHO1766. In iCHO2101, the blocked reactions of the three pathways have been reduced to 11%, 15%, and 7%, respectively. A full list of the pathways and expression levels is available in Supplementary Table 9.

Discussion

In the present study, four approaches were used to fill the gaps of iCHO1766. At first, we used GapFill that successfully filled 12% (124 out of 1049) of no-production metabolites. Then, using GAUGE, 40% (28 out of 71) of the inconsistencies between genes co-expression and flux coupling relations of reaction pairs were fixed. Furthermore, exchange and transport reactions of the model were revised, using HMDB database. Finally, new genes were added to the model based on KEGG and BiGG databases. All newly predicted reactions and metabolites were subsequently added to the model to generate a new version of the CHO metabolic model, named iCHO2101. In total, the percentage of blocked reactions was 21.6% (1441 out of 6663) in iCHO1766, which has been reduced to 11.3% (837 out of 7336) in iCHO2101. In addition, the percentage of dead-end metabolites from 21.4% (955 out of 4456) in iCHO1766 has been reduced to 6.6% (298 out of 4527) in iCHO2101. The addition of these new reactions, metabolites, and genes can increase the scope of pathways that can be simulated in CHO cells, and increase the reliability of the model predictions in general for CHO cells with more comprehensive models of CHO cell metabolism.

The importance of CHO cells in the pharmaceutical industry producing recombinant protein drugs is evident. In this regard, due to the notable drawbacks of the present kinetic models (Carinhas et al. 2012), a constraint-based metabolic model can be beneficial to have an in silico platform to mechanistically model the metabolism of CHO cells. For example, the limiting factors of cell culture can be easily modelled by constraining the exchange fluxes of the model. In addition, integration of “omics” data with a constraint-based metabolic model can shed light on the metabolism of CHO cells.

Bioprocess optimization of CHO cells has been a major topic of research, including studies which focused on the design of compositions of cell culture media (Galbraith et al. 2018; Ritacco et al. 2018). Mammalian cell culture media are mostly composed of amino acids. Amino acid metabolism greatly influences the viability and production of CHO cells (Salazar et al. 2016). The average percentage of blocked reactions in the metabolic pathways of different amino acids was reduced from 34.10% in iCHO1766 to 13.56% in iCHO2101. Therefore, the applicability of CHO model in bioprocess studies can be increased by refining the metabolic models. Recently, an extended version of the GEM of CHO cells was released, in which new constraints were added to the model based on enzyme capacity of the reactions (Yeo et al. 2020). Yet, the focus of our study is to fill the gaps and manually curate the previous model (Hefzi et al. 2016). In conclusion, with more active metabolic pathways and more precise gene-protein-reaction associations in a GEM of CHO cells, one is able to infer more accurate cell line-specific models. Such models can address the cell-specific metabolic signatures of different cell lines for better predicting biopharmaceutical production capabilities (Carinhas et al. 2013).