Keywords

AAV :

Adeno-associated viral

ACE :

Artificial chromosome expression

AMPK :

AMP-activated protein kinase

CFL1 :

Cofilin

CHO :

Chinese hamster ovary

CLD :

Cell line development

CQA :

Critical quality attribute

CRISPR/Cas9 :

Clustered regularly interspaced short palindromic repeats/RNA guided Cas9 nuclease

CRISPRa :

CRISPR-based gene activation

dCas9 :

Cas9 endonuclease dead/Dead Cas9

DIGE :

Difference gel electrophoresis

DMSO :

Dimethyl suphoxide

EPO :

Erythropoietin

ER :

Endoplasmic reticulum

Erp27 :

Endoplasmic reticulum protein 27 kDa

Erp57 :

Endoplasmic reticulum protein 57 kDa

FBA :

Flux balance analysis

HAC :

Human Artificial Chromosome

HCP :

Host cell protein

HEK293 :

Human embryonic kidney-293

Hs3st1 :

Heparan sulphate 3-O-sulfotransferase 1

HSC60 :

Heat shock protein 60 kDa

HSC70 :

Heat shock protein 70 kDa

HT :

High titre

IFN-γ :

interferon gamma

IgG :

Immunoglobulin G

iPSC :

Induced pleuripotent stem cells

LC-MS/MS :

Liquid chromatography with tandem mass spectrometry

lncRNA :

Long non-coding RNA

LT :

Low titre

MAARGE :

Multiplexable activation of artificially repressed genes

mAbs :

Monoclonal antibodies

MDH :

Malate dehydrogenase

MFA :

Metabolic flux analysis

miRNA/miR :

microRNA

MPC1/2 :

Mitochondrial pyruvate carrier 1 and 2

MS :

Mass spectrometry

mTOR :

Mechanistic target of rapamycin

NDST2 :

N-deacetylase/N-sulfotransferase

NeoR :

Aminoglycoside phosphotransferase, neomycin resistance protein

NSD :

Nucleotide sugar donor

PAGE :

Polyacrylamide gel electrophoresis

PC :

Pyruvate carboxylase

PCA :

Principal component analysis

RI :

Random integration

RNA-seq :

RNA sequencing

SEAP :

Secreted alkaline phosphatase

siRNA :

Small-interference RNA

SSI :

Site-specific integration

ST6GAL :

α-2,6-syalyltransferase

TALEN :

Transcription activator-like effector nuclease

TCA :

Tricarboxylic acid cycle

TFRE :

Transcription factor regulatory elements

UTR :

Untranslated region

VCP :

Valosin-containing protein

ZFN :

Zinc-finger nuclease

1 Introduction

Mammalian cells have successfully served as industrial platforms for manufacturing different types of biopharmaceuticals that are critical therapies for the treatment of complex chronic diseases (e.g., cancer, autoimmune disorders). These biopharmaceuticals, including recombinant proteins (e.g., monoclonal antibodies) and viral particles (e.g., adeno-associated virus vectors, AAV), are largely produced using mammalian cell factories, with Chinese hamster ovary (CHO) and human embryonic kidney-293 (HEK293) cells as the predominant platforms. While the sector has managed to substantially improve cell densities, product yields and quality through trial and error (“brute force”) approaches the capacity of these cell lines to turn recombinant genes into life-changing drugs is limited by intracellular processes (transcription, translation, processing, secretion). These intrinsic constraints are exacerbated by the development of a new generation of biopharmaceuticals, with novel designs and increased complexity for industrial manufacturing (e.g., multi-specific mAbs, complex fusion proteins, functional AAV vectors) (Fig. 1). The emergence of coronavirus disease 2019 (COVID-19) has highlighted the urgent need for production platforms that enable the rapid manufacturing of biotherapeutics on demand.

Fig. 1
figure 1

Impact of the evolution of biopharmaceuticals on the manufacturability. This figure illustrates the exponential growth in the complexity (X-axis) of biopharmaceuticals, where complex molecular design, novel action mechanisms and manufacturing difficulties result in significantly increased product price (Y-axis)

Even though we (as a community of scientists and industrialists) are aware of the cellular limitations, for productivity or product quality, we are less aware of the specific molecular events occurring within these cell factories that drive ‘good’ or ‘bad’ outcomes. Today, high-throughput omics technologies provide vast amounts of information of the cellular events, that increased our fundamental understanding of these biological systems (enlighten the ‘black box’) and will change the paradigm by which cell factories can be engineered (Fig. 2). This Chapter focuses on the current state-of-the-art technologies that may be applied for designing engineering strategies for mammalian cell lines. These include the key lessons learnt from omics analysis on factors that corelate with productivity and product quality and how the combination of molecular and computational tools with ‘omics data can rationalise intervention with mammalian cell factories and bioprocesses for enhanced production of biopharmaceuticals.

Fig. 2
figure 2

Overview of the potential of omics technologies to increase the fundamental understanding of biological systems and to optimise cell factories and bioprocesses. This figure shows how omics technologies offers the potential to elucidate the intricate network of processes occurring within biological systems and to provide the rationale for cellular and process interventions that improve the product yields and quality

2 Molecular Approaches for Engineering Mammalian Cell Factories

2.1 Overexpression of Target Regulatory Genes

With the identification of gene targets for improving biotherapeutic production (Sect. 3), the overexpression of these genes is one of the most exploited approaches to improve mammalian cell lines. Similar to the method used to express recombinant proteins, the overexpression of gene targets is achieved through the delivery of an expression vector containing a codon-optimised version of a gene’s cDNA under the control of a potent viral/cellular promoter and the presence of enhancer sequences. Incorporation of a selection marker (i.e., gene coding for antibiotic resistant or lacking metabolic enzyme) under the control of a week promoter allows the genomic integration of multiple copies of plasmid DNA and the generation of cell lines (pools) with heterogeneous expression of the recombinant gene (extensively reviewed by Gupta et al. [1]). The use of synthetic promoters to mediate gene expression in the host cell it has been also presented as an alternative engineering strategy [2]. While these approaches have been extensively used to overexpress several genes with beneficial consequences to the performance of mammalian cells, a single cell cloning process is needed to obtain cell lines with a desired phenotype and a homogenous expression of the specific target. To overcome these limitations and to target insertion for homogeneous expression, the use of semi-targeted transposase-based integration systems has been proposed [3, 4], allow transposition at a specific site of the genome or other expression systems. Several transposon systems have been designed for use in mammalian cells lines including piggyBac, Tol2 and Sleeping Beauty (Table 1).

Table 1 Technologies for engineering mammalian cells lines

An alternative to delivery of potential regulatory genes in standard DNA vectors is presented by the use of artificial chromosome expression (ACE), a technology that provides the potential to deliver a large genetic payload stable in the host cell without the need of genomic integration [10]. A series of studies have successfully used the ACE systems for expressing high levels of recombinant genes [11, 30, 31]. The capacity of ACE systems to incorporate multiple gene sequences (i.e., whole metabolic or signalling pathways) has opened the possibility to tailor-made cell factories, with desirable characteristics and phenotypes that allow for enhanced cellular performance [32]. While the generation of custom-built microbial factories have proven effective [33,34,35], this technology still needs to be evaluated in mammalian cells.

The emergence of CRISPR/Cas9 systems has provided a powerful and flexible tool for interventions of cellular genome. While often used as a genome-editing tool (Sect. 2.2), this system can be modified to genome incorporation and expression of a specific target genes. The generation of nuclease-null Cas9 (dCas9) combined with transcriptional activators (also known as CRISPR-based gene activation or CRISPRa) has been used to increase the endogenous expression of specific genes in mammalian cell hosts [36, 37]. Applications of CRISPRa in mammalian cell factories have focused on the upregulation of silenced glycosyltransferases [5] and UPR markers and anti-apoptotic genes [38]. Recently, Eisenhut et al. (2018) introduced a promising technology known as multiplexable activation of artificially repressed genes (or MAARGE), a sophisticated CRISPR/Cas9-based targeted integration system that enables the incorporation of multiple genes into the genome without laborious single cell cloning and screening process. However, additional efforts need to be undertaken to increase the expression of transgenes in order to make this novel mammalian expression system more robust.

An alternative method for expressing transgenes in mammalian cell hosts is via the direct transfection of in vitro transcribed mRNA, a technology that allows the gene target delivery without the undesired effects of plasmid-integration into the target genome. Despite the lower structural stability of mRNA compared to plasmid DNA [39], this delivery system brings forward several advantages, such as considerably higher molar amounts of mRNA per transfection [40], no overload of the transcriptional machinery to avoid genetic and epigenetic controls and no requirements for nuclear translocation [41]. However, it may impose a significant load of the translational and post-translational machinery within cells. While this technology has been extensively used for the generation of induced pluripotent stem cells (iPSCs) [42], transient modification of cell phenotypes [43] and the development mRNA-vaccines [44], it has just recently been used in mammalian cell hosts for biopharmaceutical production [45,46,47,48]. Its large-scale implementation to enhance biopharmaceutical production by mammalian cells needs exploration [49].

2.2 Knock-Out of Gene Targets

In contrast to overexpressing genes, gene knock-out offers the possibility to delete disadvantageous targets from the host genome. The technologies for deleting specific genes have evolved from chemical- or physical-induced random mutagenesis to precise genome editing systems. The migration to targeted genome engineering has been promoted by the development of highly-specific technologies, such as involving the use of zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) or CRISPR/Cas9 system [16,17,18]. These technologies have been used in mammalian cells to delete metabolic enzymes [50,51,52,53], glycosylation enzymes [54,55,56,57] and signalling molecules [58,59,60]. Amongst these techniques, CRISPR/Cas9 system has dominated investigations of the re-design of mammalian cell lines due to its rapid, cost-effective and easy-to-apply methodology, thus increasing the possibilities of alternatives engineering strategies (Sect. 2.1).

2.3 Knock-Down of Gene Targets

An alternative to gene knock-outs is gene silencing (or knock-down), an approach that decreases the amount of specific mRNA species without affecting the genome integrity. The most common technology used for gene silencing involves small-interference RNAs (siRNAs), small double-stranded RNA molecules (~20–24 bp length) that exhibit the complementary sequence of the target mRNA. The interaction between the small non-coding RNA with the specific mRNA target leads to degradation of the mRNA [19, 61]. Specific siRNAs have been used successfully to decrease the expression of metabolic enzymes or signalling proteins, resulting in enhanced culture performance of mammalian cells [20, 62,63,64]. However, due to high specificity, the application of siRNAs is limited to single targets [65].

Another technology based on RNA involves the use of microRNAs (miRNAs), small nucleotide molecules (~19–25 bp length) that bind to the 3′ untranslated region (3′UTR) of target gene transcripts by imperfect base-pairing interactions (only a section of the miRNA molecule binds to the target) that can inhibit translation of the target mRNA [65]. The lack of specificity in miRNAs enables their interaction with multiple gene targets (via common 3′UTRs). This generates the scenario on modulation of entire metabolic/signalling pathways, bringing forward the hypothesis of miRNAs as cellular tools to maintain homeostasis of, and integrate, multiple processes within cells [61]. Introduction of miRNAs into mammalian cells has led to improvements of cell growth, apoptosis and recombinant protein production [66,67,68]. However, given that the resulting interactions remain difficult to predict, the use of this technology may lead to undesired phenotypes beyond the original intention. Along the same line, several studies have proposed the applications of long non-coding RNAs (lncRNAs, ~200 bp) as regulatory tools to modulate the expression of multiple cellular events in mammalian cells. While lncRNAs have been correlated with growth and productivity in mammalian cells [22], there is a need for a broader understanding of the mechanisms that underpin the potential for lncRNA’s to be used as cell engineering tools.

2.4 Directed Evolution of Cellular Phenotype

An alternative method to develop mammalian cell hosts is through ‘directed evolution’ methodologies. The aim of directed evolution is to reprogram cellular characteristics by altering genetic circuits and metabolic/signalling pathways underpinning a complex functional phenotype in a rapid and cost-effective manner. Applications of this methodology in mammalian cells include the adaptation to suspension growth under serum-free media [26, 69, 70], optimizing nutrient feed [71, 72], rapid proliferation [73], decreased production of metabolic waste-product (e.g. Lactate and ammonia) [27, 74, 75], hyperosmolarity [28], or cold-adapted cell lines [76, 77].

3 Lessons Learnt from ‘Omics’ Characterisation of CHO Cell Bioprocessing

3.1 Lesson I: Phenotype Instability Is an Inherent Feature of Mammalian Cells

Mammalian cells present an inherent genomic plasticity that may lead to unfavourable phenotypes and loss of the recombinant gene expression with negative effects on cellular productivities (known as production instability) [78,79,80,81]. As a tool to evaluate these phenotypic changes, extended cell line stability studies (2–3 months) are performed during the recombinant CHO cell line development (CLD) processes to ensure that the selected clone maintains overall product yields and quality along the manufacturing process [82]. This long and laborious procedure has promoted the identification of markers and molecular events (from genome and epigenome dynamics) correlated with production instability in mammalian cell factories. Initial genome drafts revealed that CHO cells undergo continuous changes in genome structure (e.g., chromosomal rearrangement, other karyotype variations) and sequence alterations (i.e., copy number variation) during routine cultivations [83, 84]. In particular, inherent genomic plasticity and multiple cell lineages with unique genomic landscapes have promoted researchers to continuously update of CHO-K1 and Chinese hamster genomes [85, 86], and sequence multiple industrially-relevant CHO cell variants, such as CHO-DG44, CHO-S and CHO-DXB11 [87,88,89]. Researchers have also taken cell line- and organelle-specific approaches to develop insights into the genetic diversity amongst CHO cell lines and their response to, and interaction, with the culture environment [90,91,92,93,94,95]. These data indicated that there is no single cause to explain production instability, but rather it arises as a consequence of a series of DNA sequence and genomic structure mutations because of continuous selection pressure (i.e., enriched medium for rapid cell growth, antibiotic/metabolic selection, different cultivation environment/scales). Omics studies have also evaluated epigenetic modifications (e.g., post-translational histone modifications, or DNA methylation) that have implications for DNA arrangement and gene accessibility. CHO cell lines are prone to large epigenetic changes during continuous cultivation (both in routine maintenance culture and in production batches) and exposure to different environmental conditions, affecting both the stability of recombinant gene expression, metabolism and cellular phenotype [73, 93, 96,97,98,99]. Specific changes in DNA methylation profile can switch on/off gene expression and the extent (and positioning) of methylation correlates with transcriptional activity [100], thus providing an explanation for the diverse production phenotypes of CHO cells. There is very significant interest in the application of genomic and epigenomic information to identify highly stable expression loci for the generation of new CHO cell hosts.

3.2 Lesson II: Post-translational Events Within Secretory Pathways Limit Cellular Productivity

Secretion of functional proteins with complete processing requires the coordinated action of multiple chaperones/enzymes within the secretory pathways, a long road with several compartments and regulatory checkpoints that have been proposed as limiting factors for the production of recombinant proteins [101, 102]. In this context, transcriptomic and proteomic studies have been powerful tools for development of fundamental understanding of the biology underpinning cellular productivity. A large number of studies have used different gene (e.g., microarrays, RNA-seq) and protein (e.g., 2D-DIGE, LC-MS/MS) profiling technologies to gain insights into mammalian cell factories during production of biopharmaceuticals. Comparison between high and low producer cell lines (often at one single point during the exponential phase) has been used frequently to identify molecular components correlated with high productivities [103,104,105,106,107]. Other studies have focused on the effects of productivity enhancers, either molecules (e.g., sodium butyrate, DMSO) or environmental conditions (e.g., media/feeds, low temperature, hyperosmolarity), on the transcriptome/proteome profile [76, 108,109,110,111]. These studies have identified protein folding/secretion and cytoskeletal architecture as key biological functions correlated with high productive phenotypes across a diverse range of mammalian cell lines. Desirable qualities (in terms of cell growth, productivity and product quality) have been shown (via transcriptome and proteome profiling) to be correlated with the status of the secretory pathway, influencing the assembly, folding and processing of recombinant proteins. Therefore, many genes and proteins associated with post-translational events (from translocation to secretion) have provided the focus for potential targets for cell line engineering, with the caveat that these potential strategies remain cell- and protein- specific making them difficult to extrapolate to other cell factories.

Whilst transcriptome and proteome data have been used to provide mechanistic understanding of productivity of CHO cells, these profiling approaches have limitations for holistic explanation of the systems-level molecular events that set desirable cellular performance. This is particularly relevant for the study of the secretory pathways, orchestrated by specific transcriptional regulators and multiple enzyme and chaperones, whose expression varies dynamically according to the cell line and the surroundings [112, 113]. This limitation has been addressed with the development of computational tools and genome-scale models that are designed to allow the integration of multiple data sets of different nature (e.g., transcriptomic, proteomic, metabolomics) [114, 115]. Use of a multi-omics approaches aims to provide a more robust molecular description that takes into consideration the dynamic interplay between different levels of cellular integration. Recently, a genome-scale reconstruction of the secretory pathway highlighted the relevance of the post-translational events in the cellular productivity, delineating the metabolic costs and cellular machinery burden of each secreted protein in CHO, mouse and human cell lines [116]. The authors showed that highly secretory cells undergo a global adaptation that resulted in the decreased expression and secretion of energy-expensive host cell proteins, and provided a platform for simulating cellular interventions (knock-out/down) with the aim of enhancing performance of mammalian cells. In the context of rational design of mammalian cell factories, the combination of high-throughput ‘omics data with computational tools has the potential to revolutionise cell line development and opens possibilities for defining multiplex cell engineering strategies that target the secretory pathway (and essential ancillary reactions and cellular components), producing a systems-level shift towards a desired cellular phenotype.

3.3 Lesson III: Efficient Metabolism Makes Mammalian Cell Factories More Effective

Cellular metabolism encompasses the summation of all the biochemical reactions occurring within cells that support biological processes, by supplying biosynthetic building blocks (for biomass or protein production) and energy currency (in the form of ATP) as well as acting as regulatory elements in signalling pathways (e.g., mTOR, AMPK) [117]. While there is a generalised consensus about the relevance of cellular metabolism towards the performance of mammalian cell factories, we are still challenged by the metabolic definitions/signatures of a ‘good’ or ‘optimal’ bioprocess. Mass spectrometry (MS)-based metabolomics have become the tool of choice to analyse changes in concentration of (specific) metabolites both within cells and in the surrounding medium, thus providing substantial amount of information regarding the interaction of mammalian cell with their environment (culture medium) and characterisation of cell metabolism [118, 119]. Extracellular metabolite profiling monitors the main components of the culture medium (i.e., main carbon and energy sources, metabolic by-products, vitamins) that are critical for maintenance of growth and productivity [79, 118, 120,121,122,123,124,125,126]. The combination of this data with multivariate statistical analysis (e.g., principal component analysis [PCA] or partial least squares [PLS] variants) and/or stoichiometric metabolic modelling (e.g., metabolic flux analysis [MFA] or flux balance analysis [FBA]) has led to the identification of metabolic signatures, reflective of bioprocess status [127,128,129,130,131,132,133,134]. Particular interest has focused on the analysis of (specific) by-products of glucose metabolism (e.g., lactate, glycerol) and/or amino acid (e.g., ammonia, phenyl lactic acid, 2-hydroxybutyric acid, indole-3-carboxylate) metabolism that reveals catabolic imbalances that can impair the performance of mammalian cell cultures (extensively reviewed by Pereira et al. [135]). These valuable insights have promoted diverse process and cell engineering strategies focused on optimising cell metabolism using customised media/feeds or targeting the expression of metabolic enzymes.

Intracellular metabolite profiling offers insights of the physiological state on cells in culture via profiling metabolites indicative of energy metabolism, redox state, nucleotide synthesis and regulatory aspects of metabolic pathways [27, 70, 124, 136,137,138]. Additionally, the specific focus on lipids (lipidomics) has provided insights into regulatory control of cell growth, robustness and morphological status [139]. Some ‘good’ metabolic features (in terms of bioprocessing effectiveness) are emerging. For example, oxidative metabolic signatures (i.e., increased TCA cycle flux, favourable NADPH/NADP and GSH/GSSG ratios) underpin high-energy supply that leads to enhanced cellular specific productivity and prolonged culture lifespan. In contrast, a glycolytic metabolic signature sustains rapid growth of mammalian cells, but with an associated “cost” of high production of metabolic by-products (e.g., lactate) that may be detrimental to cell viability and product quality [124, 127, 128]. Extension to this type of knowledge will be crucial for identifying ‘good’ performance of clones during cell line development and scale-up processes. However, there remains a gulf between defining metabolic signatures and clone selection or setting environmental conditions to support the desired equilibrium between high cell growth and productivity. We require further improvements in the metabolism monitoring technologies to increase our capacity to precisely assess and control metabolic processes and to design new cell lines with optimal metabolic signatures for biopharmaceuticals production.

3.4 Lesson IV: Product Quality Depends on an Intricate Network of Enzymes

Glycosylation (particularly N- and O-linked) is a hallmark for all secreted proteins and is a critical quality attribute (CQA) for biopharmaceuticals affecting their functioning and immunogenicity as therapeutics. The emergence of MS-based technologies has allowed the precise characterisation of glycan composition, indicating that CHO cells generate heterogeneous profiles of N- and O-linked glycosylation, with the precise profile being dependent on CHO cell variant [140], cell clone [141], culture medium/feed (e.g., glucose/glutamine levels, galactose/mannose supplementation) [142,143,144] and culture environment (e.g., pH, temperature, ammonia concentration) [143, 145,146,147,148,149]. Although glycan characterisation provides valuable information about the structure and quality of an expressed protein, the resultant heterogeneous glycosylation profile lacks the capacity to provide information about cellular pathways or culture conditions that lead to specific glycosylation profiles. Attachment and maturation of glycans are part of a complex biosynthesis/processing pathway involving a series of organelle-specific reactions that start within the endoplasmic reticulum (ER) and mature in the Golgi apparatus, via the action of multiple sequential and/or parallel processing enzyme pathways [150]. This complexity of potential reactions imposes a significant challenge for prediction of glycosylation profile or the extent of heterogeneity in the final product during bioprocessing. Therefore, global approaches will be necessary for a deeper understanding of the glycosylation machinery, an understanding that will need to integrate different layers of information (i.e., transcriptomics, proteomics, metabolomics, labelled-microscopy). Multi-omic approaches have been used to analyse the expression, activity and specificity of glycosyl transferases and hydrolases [151,152,153], to develop an understanding of the co-localisation enzymes and substrates (precursors) within compartments of the ER and the Golgi [154, 155] and to combine metabolic data into computational algorithms [146, 156]. In particular, the use of genome-scale and/or kinetic models in combination with process and cellular information have greatly contributed to a better understanding of the glycosylation machinery and set correlations and predictive systems that enable the association of inputs and outputs [116, 157,158,159,160]. With these resources, the intricacy of the glycoprotein processing network is being developed and, in turn, this is providing direction towards cellular/process engineering strategies to generate specific glycan structural profiles on the surface of recombinant proteins (an outcome that is particularly relevant for the manufacture of biosimilar and biobetter therapeutics) [161].

4 The Application of Data-Driven Approaches to Mammalian Cell Engineering

Omics technologies have been used to understand the biology of mammalian cells and to gain a better understanding of host cells and bioprocesses used to manufacture biopharmaceuticals. Earlier Sections have highlighted the types of engineering technologies and omics’ studies, this Section focuses on examples where the lessons learnt from omics’ studies have been translated into improved host cell systems and/or increased biopharmaceutical production through engineering approaches (summarised in Table 2).

Table 2 Summary of studies that have applied Omics data in the engineering of mammalian cells

4.1 Approach I: Increasing Stability of the Recombinant Gene Expression

The availability of detailed genome sequences and transcriptomic/epigenomic profiling, has enabled production instability of industrial mammalian cell lines to be addressed by use of site-specific integration (SSI), that targets recombinant gene incorporation to specific genomic loci with high expression, stability and desired epigenetic properties (also called hotspots or safe-harbours) [170]. Initial attempts to use SSI relied on the specific locus identified using genome and gene expression through screening of mammalian cell lines transfected with random integration (RI) protocols. These approaches have generated cell lines with high product yields using a single copy SSI, leading to comparable results to traditional random integration (RI) protocols [171, 172]. However, many potential genomic hot-spots still need to be experimentally validated in industrial settings [173]. Recent studies have searched for novel safe-harbours through systematic evaluation of the epigenetic signatures of mammalian cells, an approach that provides a clearer overview of gene transcription control within the context of the “living nucleus” and a potential map for identifying integration sites with maximum transgene production [93, 100]. Hilliard and Lee (2020) combined epigenomics and transcriptomics to analyse changes in the epigenome that occur during CHO cell line development and which can be related to different gene expression profiles in both host and recombinant CHO cell lines. The authors found that only 10% of the CHO genome contained transcriptionally permissive 3D chromatin structures with the enhanced genetic and epigenetic stabilities required for a desirable SSI [174]. These results provide a critical step towards further cellular interventions that increase the potential of SSI systems for generation of cell lines with high stability and expression of the transgenes.

A further application of omic technologies to increase recombinant gene expression utilises transcriptional information to design novel promoters. Johari et al., (2019) identified genes within the CHO genome that displayed high transcriptional activity under these different bioprocess environmental conditions. From these data, transcription factor regulatory elements (TFREs) were identified in the upstream regions of differentially-expressed genes and a specific subset of TFREs were functionally screened and were shown to support enhanced recombinant gene transcription in response to a switch to mild hypothermic growth conditions. Using such elements, the study generated novel synthetic promoters that were able to drive increased expression of recombinant genes in CHO cells, with an overall increase to cell productivity (up to 2.5-fold) [2]. This study exemplifies how omics enables re-design/tailored expression systems to develop improved manufacturing systems and processes.

4.2 Approach II: Enhancing Productivity by Targeting Secretory Pathways

Secretory pathways that convey recombinant proteins towards the extracellular environment consist of multiple intracellular compartments and are integrated by the action of several chaperones/enzymes that may limit post-translational events and overall production (Sect. 3.2). Transcriptomics and proteomics have provided great understanding of these events and suggest potential targets for enhancement of the overall process. For instance, Baik et al. (2011) investigated the intracellular proteome of recombinant CHO cells expressing Erythropoietin (EPO) in serum-supplemented and serum-free media. Proteomic profiling via 2D-PAGE and mass spectrometry analysis identified two chaperones, heat shock protein 70 kDa (HSC70) and 60 kDa (HSP60), as more highly expressed under serum-free conditions than in serum-containing medium, therefore directed them as potential cell engineering targets. Subsequent overexpression of HSC70 and HSP60, separately or together, led to an increased cell density (between 10% and 15%) and a decreased time for CHO cell adaptation to serum-free conditions [69]. Another study, using proteomic analysis of recombinant CHO cells, identified actin cytoskeleton regulator cofilin (CFL1) as a limiting factor for cell specific productivity of recombinant SEAP [62]. In a later study, the authors used siRNA to knockdown of CFL1 in CHO cells, resulting in an increase (80%) in recombinant protein specific productivity [63].

Comparative transcriptome analysis of recombinant CHO cells showing differential specific productivities has also suggested potential targets for cell engineering. For instance, transcriptomic analysis of CHO-K1 derived cell lines identified 32 potential target genes that were up-regulated in high producing clones – these candidates were involved in a variety of cell functions including signalling, protein folding, cytoskeleton organization, and cell survival [168]. Directed overexpression of two of the potential target genes in the ER (Erp27 and Erp57, which are chaperones that bind to unfolded proteins or are involved in di-sulphide bond formation, respectively) increased the cell density and culture viability. In addition, the production of a ‘difficult-to-express’ recombinant protein (interferonβ) was increased significantly, interpreted as a result of enhanced folding activity during processing and secretion [168]. An alternative strategy from the same study was overexpression of Foxa1 which was able to induce multiple metabolic changes to improve protein yields, decrease oxidative stress and improve cell growth.

At translational level, recombinant gene transcripts will compete with endogenous cellular transcripts at the level of the ribosome and this represents a potential molecular site for control of both recombinant gene expression and normal cellular regulatory events. Kallehauge et al. (2017) described a genome-wide study of protein translation (translatome) using ribosome profiling and an associated transcriptomic study (RNA-seq) for an antibody-producing CHO cell line. Whilst other studies have focused on global changes in translation, analysis of the translation of recombinant targets remains largely unexplored. This study showed that the recombinant mRNA sequestered up to 15% of actively translating ribosomes. Combined with transcriptomic analysis, the authors showed that the amount of transcript of the recombinant target influenced the cell-specific productivity. Using the associated datasets, the study examined the effects of limiting the expression of the NeoR resistance marker, defining how much the load of an associated selection marker gene could have on recombinant gene expression. Knockdown of the NeoR gene via siRNA increased cell growth and antibody production (18% increase in antibody yield). This work has generated an important paradigm surrounding the balance of translation for yields of the desired protein and sets an exemplar study to illustrate that shifting the transcriptional and translational capacity away from ‘unnecessary’ transcripts can increase the cellular ability to channel resources towards recombinant protein production [64]. Such studies combining genome-wide screening and multi-omics analysis provide a global view of the cell status and a powerful tool to identify how best to engineer the balance in cellular profiles to increase the capacity to produce biopharmaceuticals.

With the application of multi-omics analysis coupled with genome-scale modelling, researchers have shown the cellular burden that the secreted proteins impose on CHO cells and, specifically, to identify host cell proteins with increased metabolic costs [116]. This knowledge led to the design of a multiplex cell engineering strategy that created cell systems that are better producers and contain fewer process-related impurities such as host cell proteins. Kol et al., (2020) proposed eliminating host cell proteins would allow cellular resources to be channelled towards protein secretion, in particular recombinant protein production, whilst decreasing host-cell protein contamination in downstream processes. Their study a generated a series of 6, 11 and 14 protein knock-out clones via CRISPR/Cas9-mediated multiplex gene disruption, which resulted in between 40–70% decreased HCP content. Consequently, an improvement in antibody titre, quality and purity was observed [53]. Potentially this concept of modelling and gene editing can be a powerful approach to make better CHO cell factories with desirable consequences on the production and quality of biotherapeutics.

4.3 Approach III: Making Cell Metabolism More Efficient Through Metabolic Engineering

With the aim of improving the efficiency of cell metabolism to generate an enhanced manufacturing system, strategies have been suggested around channelling more metabolic intermediates into the mitochondria (for the TCA cycle) to enhance carbon utilisation and energy production. For instance, Chong et al., (2010) identified a potential bottleneck in the CHO cell TCA cycle from the accumulation of malate in culture medium, indicating a limitation in the conversion of malate to oxaloacetate (a reaction catalysed by malate dehydrogenase II [MDHII]). Overexpression of MDHII in CHO cells led to increased viable cell density, antibody production, higher amounts of intracellular ATP and NADH and decreased lactate production per cell [27]. Other approaches to increase TCA cycle activity have targeted the handling of pyruvate, a recognised metabolic bottleneck in CHO cells. Overexpression of pyruvate carboxylase (PC), which catalyse the conversion of pyruvate to oxaloacetate, significantly improved recombinant protein production and decreased lactate formation [175,176,177]. As an alternative, Bulté et al. [178] overexpressed mitochondrial pyruvate carriers (MPC1 and MPC2) in CHO cells, a strategy that resulted in an increased TCA cycle flux, decreased lactate production and increased r-protein production.

Metabolomics studies have identified several metabolic by-products that indicate loss of carbon from the inefficient catabolism of glucose and amino acids [135]. This knowledge has opened the possibilities of targeting specific enzymes to increase the efficiency of CHO cell metabolism. The most well-described exemplar by-products are lactate and ammonia, metabolites that present toxic effects during bioprocesses [179, 180]. The application of different process and metabolic engineering strategies have been applied successfully with resultant decreases in the production of these metabolites and improved culture performance of CHO cells [64, 74, 181,182,183,184,185,186,187,188,189]. Additionally, a series of intermediates or by-products of amino acid metabolism, have been identified as growth inhibitors of CHO cell growth [190]. The identification of these growth-inhibiting metabolites has promoted the development of a metabolic engineering strategy that completely eliminated the production of these compounds and enhanced cell growth and productivities in fed-batch cultures [191].

Analysis of culture limitations has led to the rational design of medium supplementation strategies or the design of nutrient feeds to overcome bottlenecks and boost cell culture performance [70, 72, 164]. Such strategies have proven to be effective and easily employed compared to genetic engineering approaches. Sellick et al. (2011) developed a nutrient feed based around four key amino acids (Table 2) which were observed to be depleted prior to the onset of the stationary phase. Use of the feed led to an increase in cell biomass and antibody titre [70]. Other studies have used multiple omics technologies to study cell culture processes, identified metabolic bottlenecks and designed feeds/supplements as a result. Blondeel et al., (2016) utilised both metabolomics and proteomic analysis to assess nutrient depletion and waste product accumulation following stable expression of a monoclonal antibody in CHO cell cultures. Subsequently, a nutrient feed was tailored to their specific process with 8 metabolites that were observed to be depleted in culture and use of this feed regime resulted in increased cell growth (∼75% increase in peak cell density) [72].

Schaub et al. (2010) compared two fed-batch processes with an IgG-producing CHO cell line, one process was labelled as high titre (HT) and the other as low titre (LT). Transcriptomic analysis showed differences in gene expression between both processes over the time course of the fed-batch culture. In particular, gene expression of lipid metabolism pathways was upregulated in the HT process. In their study, the transcriptomic dataset led to design of a medium with increased lipid concentration, which, when added, resulted in a 20% increase in antibody titre [162]. Further, metabolomics studies using NMR measured and monitored intracellular and extracellular CHO cell metabolites and the data from that study directed the development of a proprietary growth medium which supported increased cell productivity and the protein quality [71]. The authors also showed a link between the depletion of histidine and decreased cell productivity.

Another study [163], applied a model to the metabolic data to identify limiting amino acids or those that significantly impact the recombinant target. The metabolic model was integrated and validated with transcriptomic analyses. Huang et al. (2020) described a genome-scale metabolic model to further understand their bioprocesses alongside transcriptomic analysis via RNA-sequencing. Using these models, strategies to optimise the culture medium were employed and verified experimentally. Such modelling and simulations aid in understanding processes and enable the development of new strategies to overcome limitations and minimise experimental testing of different combinations. In this case, the feed design strategies led to increased cell productivity [164]. Together these studies report an increase in cell growth and/or protein yields through medium deign and targeted nutrient feeds. Some overlap with the identified amino acids was seen between studies. However, no universal strategy was apparent potential due to different cell systems and recombinant proteins imposing different metabolic demands and consequent requirement for specific feeding regimes to achieve an efficient bioprocess.

4.4 Approach IV: Developing Specific Glycosylation Profiles Through Glycoengineering and Medium Design

Protein glycosylation status has proven to be a dynamic process during cell cultures, where the amounts of mannose and galactose species vary in a time-depending manner [157]. Sumit et al. (2019) undertook an integrative approach that employs multi-dimensional omics analyses (transcriptomic, metabolomics and glycomics) to analyse the glycosylation dynamics in recombinant CHO cells. The authors showed that changes to cellular metabolism (including central carbon metabolism and nucleotide sugar donor, NSD, biosynthesis) led to temporal bottlenecks in the addition of galactose and sialic acid. This knowledge enabled improvements in glycosylation heterogeneity by use of feeds with customised compositions of galactose, ManNAc and GlcNAc to bypass the impairment to biosynthetic pathways for NSDs [192]. For a refined control of glycosylation profiles, a synthetic biology approach has been used to redesign the glycosylation machinery. Chang et al. (2019) knocked out two glycosyltransferase genes and reintroduced synthetic glycosyltransferase genes under constitutive or inducible promoters. This allowing the production of antibodies with defined fucosylation (0–97%) and galactosylation (0–87%) contents [193]. These examples illustrate the possibilities for precise modification of N-glycosylation through process and genetic approaches that generate protein therapeutics with customised critical quality attributes.

CHO-based transcriptome analysis is a powerful approach for identification of the relationship between undesired product quality and the expression of specific metabolic/glycosylation enzymes. For instance, the lack of expression of α-2,6-syalyltransferase (ST6GAL) in CHO cells limited the production of recombinant protein with appropriate human glycosylation profile (sialic acid content) (Jenkins et al., 1996). Several groups have addressed this limitation by overexpressing ST6GAL in CHO cells that allowed these cell lines to generate the α-2,6-syalylated glycan residue [194,195,196,197]. Another example is associated with the lack or minimal expression of metabolic enzymes, N-deacetylase/N-sulfotransferase (NDST2) and heparan sulphate 3-O-sulfotransferase 1 (Hs3st1), critical for the synthesis and function of the anti-coagulant heparin [167]. This data led to engineered CHO cell systems overexpressing both enzymes, a strategy that resulted in the production of heparin with improved quality compared to previous efforts [167, 198].

4.5 Approach V: Improvement of Cell Growth Characteristics by Targeting Engineering of Apoptotic Pathways

Many studies have sought to improve cell growth and culture viability. One such study used a combination of transcriptomic and proteomic analysis of cell cultures with high and low growth rates to identify a panel of potential engineering candidates [20]. siRNA knockdown of these targets identified Valosin-containing Protein (VCP) as having the biggest effect on cell culture growth and viability. Transient VCP overexpression resulted in increased cell growth and no impact on viability, further knockdown of this target had an adverse effect on culture growth validating the earlier observations that VCP was a key gene target to influence cell growth [20]. Stable overexpression of this target could potentially result in a better host cell phenotype with increased cell growth. Further, Wong et al. (2006a) reported that engineering of anti-apoptotic genes in CHO cell cultures delayed the onset of cell death and increased recombinant protein titres. Transcriptomic analysis of batch and fed-batch CHO cell cultures identified four differentially expressed genes (Fadd, Faim, Alg-2, and Requiem) [199]. In a later study, by the same group, the consequences of overexpression or knockdown of these genes were examined in a recombinant CHO cell line expressing human interferon gamma (IFN-γ) [165]. The data showed that targeting these anti-apoptotic genes conferred apoptosis resistance and enable prolonged cell cultures, improved cell viability and increased IFN-γ production and quality. However, the applicability of this strategy to other cell types and recombinant targets remains to be seen.

CHO-S cells grown in either fresh chemically-defined medium or nutrient-depleted were profiled by transcriptomic analysis [169]. In the depleted medium, cells showed increased caspase-3/7 activity, lowered culture viability and active apoptosis. Transcriptomic profiling via microarrays identified that 70 miRNAs were differentially expressed between medium types. In particular, mmu-miR-466 h, was identified as highly up-regulated in the depleted medium conditions and overexpression of mmu-miR-466 h decreased the amount of mRNA encoding several anti-apoptotic genes. Following analysis of the omics data set, the cells were transfected with an anti-miR-466 h which led to ~15% higher cell viability and decreased activation of caspase-3/7.

Another example was the identification of microRNA-7 (miR-7) as a potential target which promoted the development of a cell engineering strategy based of miR-7 overexpression that increased cell productivities [166]. However, using this strategy a decrease in cell growth was observed in response to miR7-overexpression suggesting that miR-7 expression may impact other processes such as protein translation or secretion in a temperature-dependent manner (Sect. 3.2). The use stable miRNA expression has gained attention to improve cell growth and productivity of CHO cell lines. Through a genome-wide miRNA screen, Strotbek et al., (2013) identified 9 miRNAs which correlated with an increase in IgG1 production. Expression of two of these identified miRNAs, miR-557 and miR-1287 increased viable cell density, protein titre and cell specific productivity [68]. Application of array analysis identified potentially unannotated miRNA sequences or those with unknown function. These example shows how omics can be used to identify the roles of miRNAs and how these can be engineered to overcome limitations on cell growth phenotypes and cell productivity.

5 Future Perspectives

‘Omics datasets, coupled with engineering technologies presented here, are powerful tools for better understanding of cellular profiles and can be used to improve bioprocesses and increase biopharmaceutical production and quality. The use of omics has allowed researchers to gain large data sets that report on different CHO host cell backgrounds, distinct industrial bioprocesses, the impact of recombinant protein production on cell cultures and/or to better understand specific molecular events on a larger-scale at multiple cellular levels compared to previous technologies.

The methodologies used for omics analytics have improved vastly over recent years and the generation of data is no longer limiting. Currently, the ‘bottleneck’ mainly lies in the data interpretation using the databases and bioinformatic tools available and in the identification of the extent to which observations, amongst the very sizable amounts of data acquired, have functional significance or present viable targets for rewarding engineering. This step is often time-consuming, requiring follow-up studies to screen and validate potential markers and may well not bring the hoped-for rewards. The genomic databases used for bioinformatic analysis of CHO cells were initially limited and/or poorly annotated compared to other cell types e.g. Human cell lines. This restricted the certainty of interpretations. Technologies have advanced and CHO genome tools continue to be updated leading to a significant increase in our understanding of the predictability of engineering efforts.

Despite these limitations, many have sought to analyse and interpret omics data in CHO cells in response to different stimuli, but few studies have translated these observations into process development changes and/or increased biopharmaceutical production. To date the translation of omics analysis remains restricted and little is published in this area, compared to the significant number of papers that report omics data from CHO cells. Whilst there may be proprietary challenges in reporting the application of information to direct manufacturing outcome, it is also clear that we are moving to an era where the increased robustness of data and improved interpretation, along with technology to allow selective gene manipulation, is likely to generate improved translation of data to outcome.

The emergence of systems biology approaches to generate genome-wide models is proving to be a powerful resource to study cellular changes, identify potential limitations and guide engineering strategies to push expression hosts cells further to make ‘super-producers”. With the advancing technologies, the use of omics is becoming more accessible for researchers to use. Therefore, the field will benefit from bioinformatic tools that seamlessly compare and integrate multi-omic analyses such as genomic, transcriptomic, proteomic and metabolomic data. This will provide a holistic view of the cell and/or culture status and allow modelling/predictions of scenarios that would result in a more efficient process without the need for rigorous empirical screening and validation.

As we move towards increased innovation in biopharmaceutical production, as therapies become more complex with changing modes of manufacturing, this will challenge existing expression systems in different ways. The use of omics will prove to be a dominant force in order to characterise and understand these specific processes and develop methodologies with greater efficiency. With greater accessibility to such technologies and bioinformatic tools we anticipate further expansion in this area and greater application of this data into bioprocesses to create better and smarter platforms for biopharmaceutical production.