1 Introduction

Biopharmaceuticals, also known as biologics and biotech drugs, are substances produced in organisms such as microorganisms and plant and animal cells. These large and complex molecular drugs have high therapeutic value [1]. The global biopharmaceutical market is forecast to reach US$526,008 million by 2025 [2]. Therefore, the development of strategies to improve the yield and lower the final cost of these products is being considered. CHO cells are the major mammalian expression system widely used for therapeutic production [3]. Since 1982, one-third of all Food and Drug Administration-approved biopharmaceuticals have been manufactured in this host cell as a cell factory [4]. In response to the market demand for these biologics, the production yield of Chinese hamster ovary (CHO) cells needs to be improved using novel technologies.

The word “Omics” refers to the extensive functional and relational characterization of numerous biological molecules in a given system. Omics analysis is a potent technique for better understanding various phenotypes and for the discovery of gene targets for engineering. A key goal of omics research would be to identify new gene targets and alter the expression of genes and proteins associated with biological processes such as transcription, cell cycle, and apoptosis [5].

Omics approaches open a comprehensive window to understand the mechanism of molecules in cells and organisms to determine the genes (genomics), mRNA (transcriptomics), metabolites (metabolomics), and proteins (proteomics) involved in biological processes [6].

Omics-based approaches can facilitate the development of biopharmaceuticals with more precise characteristics than those currently possessed by CHO cells. Web-based databases such as the CHO genome are an important resource for CHO genomics to detect the sequences of CHO genes [7]. The accessibility of the entire CHO genome sequence and recent advances in genomic approaches pave the way for discovering genes involved in improving the yield of CHO bioproducts at the genome-wide level [8]. For example, genome-wide association studies have identified proliferation-related genes and found 77 genes to enhance CHO cell survival [9]. The proteomics study also showed that product stability and clone-specific productivity leads to changes in the CHO proteome [10]. A rapid acceleration in data-driven ‘omics research on CHO cells has been witnessed since the publication of the CHO-K1 genome that determines the compositions of higher-producing cells from the transcriptomic and proteomic perspective [11].

Genome-editing technology comprises a set of methods that can accurately modify the sequences of cellular DNA at specific genomic locations by inducing site-specific DNA breaks. These breaks can then be repaired by DNA repair mechanisms, resulting in various types of modifications, such as insertion, deletion, and replacement of genetic material. These modifications in the genomic DNA can be inherited by future generations. Genome-editing technology can significantly contribute to the study of molecular mechanisms underlying desired phenotypes in biotherapeutic production [12].

There are three methods for targeted genome editing: transcription activator-like effector nucleases (TALEN), zinc finger nuclease (ZFN), and clustered regularly interspaced short palindromic repeats (CRISPR). As a breakthrough of this century, CRISPR-Cas9 is effectively used for site-specific gene targeting [12]. This simple and quick system has been used in bacteria, plants, and animals [13]. Furthermore, CRISPR-Cas9 can be used as an innovative toolbox for gene knockout/in, high-throughput gene screening, live cell labeling, regulation of gene expression, and single-stranded RNA (ssRNA) editing [14, 15].

Although the CRISPR system has faced various bottlenecks and challenges, it facilitates the production of genetically engineered cells, such as various CHO lines with improved protein yield and quality. For example, simultaneous disruption of Fucosyltransferase 8 (FUT8), BAX, and BAK (BAX/BAK are pro-apoptotic proteins) in CHO cells using a single vector containing three individual guide RNAs resulted in a reduction in fucosylation and improved apoptotic resistance [3]. Furthermore, the CRISPR-induced knockout of 4GALT1 showed that it significantly contributes to galactosylation in both CHO-K1 and CHO-S cell lines [16]. As a molecular toolkit, CRISPR has been applied for transgene overexpression in CHO cells mediated by precise site-specific and efficient genomic integration [17].

The development of new technologies to modify CHO genomes and the collection of sufficient cellular and molecular data are essential for improving commercially valuable traits in CHO cells. Our ability to identify and uncover molecular insights into recombinant protein is heavily influenced by omics and CRISPR technologies.

Using genome-editing techniques, we expect to create a CHO cell lineage with higher protein yields in the near future with significant improvements in quality in omics data output. Such cells can be used to create genetically modified cells additionally omics technologies will pave the way for the study of mammalian cells for both academic and industrial purposes in finding more detailed datasets and providing deeper insights into cellular engineering avenues [18].

Even if research efforts to express biopharmaceutical-relevant proteins, stably transfected clones generated have improved by switching to high-throughput component screening in systems that replicate large-scale bioproduction processes, there is still a financial drive to shorten the amount of time and cost. The future markets will desire shorter development times and greater yields in specified systems. This dream will be possible through the utilization of omics-based approaches as a tool for bioprocess optimization.

To demonstrate how omics techniques can be used to increase recombinant protein production in CHO cells, this paper provides a comprehensive overview of the use of omics and CRISPR technologies to improve CHO cell productivity. It discusses the latest advances in CRISPR technology, such as the use of CRISPR/Cas9 for genome-wide screening and the control or alteration of specific genes. It highlights the potential of omics and CRISPR technologies to identify genes involved in increasing recombinant protein production and provide molecular insights (Table 1). Finally, it provides a roadmap for the future development of these technologies for the optimization of recombinant protein yield in CHO cells (Table 2).

Table 1 Target genes for improving CHO cells productivity
Table 2 CHO-omics technologies, web resources and online software

Overall, the manuscript provides a valuable overview of the latest advances in the use of omics and CRISPR technologies to improve CHO cell productivity. It is a timely and informative resource for researchers in this field.

2 Overview of omics

Modern sensible cell design is based on omics strategies for finding new genomic targets [19]. Numerous omics-based studies have been recruited in CHO cells to identify yield-enhancing properties to improve yield of biopharmaceutical products [20]. These studies have shown that altering gene/protein expression involved in various biological functions, including transcription, protein synthesis, folding and secretion, cytoskeleton, apoptosis, and cell cycle can promote protein production.

Three of the most popular areas of biology within omics relate to genomics, transcriptome, and proteome analysis. Genomics studies the entire genome, including the full set of genes, their sequence, and their interactions within organisms [20, 21]. Transcriptomics studies the transcriptome as a complete set of RNA transcripts under different circumstances using high-throughput methods [22]. Omics technologies can reveal profiles that indicate the most efficient production techniques and enhance our comprehension of cellular functions. Leveraging these technologies can aid in every stage of biotherapeutic development. To revolutionize manufacturing yield, process development scientists are combining omics approaches with advanced analytical capabilities and novel informatics approaches. As a result, they will be able to produce cell lines with enhanced culture features, leading to increased manufacturing yield in the future. Overall, omics highlights targets for genetic engineering to increase productivity in CHO cells (Fig. 1).

Fig. 1
figure 1

The overview of omics, important branches of omics with their major components being used in different integrated approaches include genomics, transcriptomics, epigenomics, proteomics, and metabolomics

3 Overview of CRISPR technology

The genetic engineering of biological systems and organisms offers enormous potential for use in basic research, biotechnology and medicine. Site-specific programmable endonucleases facilitate precise editing of endogenous genomic loci. Various genome-editing strategies have recently been introduced, including ZFNs [23], TALENs [24], and the RNA-guided CRISPR-Cas nuclease system [12]. ZFNs are artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain. ZFNs have been used successfully to create genome-specific double-strand breaks and thereby stimulate gene targeting by several thousandfold. ZFNs have some drawbacks such as limitation in the specificity which can lead to off-target effects and also it may induce cytotoxicity resulting in cell death or reduced growth [25].

TALENs are a type​​ of engineered​​ nuclease that​​ enable the targeted​​ alteration of​​ any DNA sequence​​ in a wide range​​ of cell types​​ and organisms​. TALENs are​​ based on DNA​​ protein interactions​​ and exhibit ​​ specificity and​​ efficiency for gene editing. However, TALENs have some bottlenecks including it often results in mosaicism and it has limited to simple mutations [26].

Compared to ZFNs, TALENs have been found to have improved specificity and lower toxicity. However, TALENs are difficult to deliver into cells effectively using lentivirus or a single adeno-associated virus particle because of their size and highly repetitive structure. Strategies for overcoming these restrictions have been developed, and it has been shown that adenoviral vectors are particularly efficient at facilitating the delivery of TALEN to cell types that are difficult to transfect. Additionally, TALENs can be delivered into cells as mRNA or even protein, which makes them easier to deliver into cells. mRNA delivery of TALENs has been shown to facilitate efficient knockout of the HIV co-receptor CCR5 [27, 28].

To address the bottle necks of these two DNA-binding proteins, scientists have developed a robust and powerful technique called CRISPR system. As a microbial adaptive immune system, CRISPR-Cas cleaves foreign genetic elements using CRISPR-associated (Cas) protein, non-coding RNA (small guide RNA), and a specific array of repeating elements (direct repeats). These characteristic repeats are punctuated by short variable sequences integrated from targeted invasive DNA called protospacer sequences. These two elements together form the CRISPR RNA (crRNA) array. Within the target DNA, each protospacer is followed by a protospacer adjacent motif sequence, which can differ depending on the Cas protein [29]. Heterologous expression of mammalian codon-optimized Cas9 and RNA components results in DNA being targeted using three steps: recognition, cleavage, and repair.

The CRISPR system is not just a gene editing tool due to the application of catalytically inactive Cas9 in gene regulation, epigenetic editing, imaging, and chromatin engineering [30]. Dead Cas9 (dCas9) was used to modify expression of target genes without disrupting the genome sequence. dCas9-based CRISPR systems, including CRISPRa (activator) and CRISPRi (interference), were invented to modify gene regulation. Fusion of VP64 or p65 as transcription activator domains to the dCas9 leads to the construction of the CRISPRa system. Construction of CRISPRi, Kruppel-associated box (KRAB) was fused to dCas9. For epigenomic modifications, dCas9 has been linked to lysine-specific demethylase 1 (LSD-1), acetyltransferase, DNA methyltransferase 3a (DNMT3a), as well as ten-eleven translocation (TET) dioxygenase 1 (TET1) [15].

Genome-editing strategies, particularly the CRISPR system, have emerged as a significant aspect of CHO cell line engineering to improve recombinant protein production with an emphasis on product diversity, quality, and yield [31]. CRISPR-associated (Cas) RNA-guided nucleases are one of these techniques that have gained widespread acceptance. Although it may be prone to off-target mutations, this technology offers a focused and precise way for producing insertion/deletion (indel) mutations or site-specific sequence integration in CHO cells. It can be used to modify a particular sequence or control gene expression [32].

A holistic approach, in which CRISPR-mediated genome editing is mediated by omics studies, paves the way for CHO cell engineering to respond to the demand for the highest quality and the lowest cost therapeutics. Additionally, in a short period of time, we will witness the construction of CHO production cell lines with stable and high expression levels.

4 Genomics and CRISPR in CHO cell improvement

Genomics includes all of the genetic information that a living organism needs to survive. The CHO cell genome is estimated to be 2.45 GB of nucleotides with 24,383 predicted genes [11]. In 2011, the full genomic sequence of CHO-K1 cells became available on CHO genome.org [33], and thereafter efforts focused on the publication of the additional genomic sequence data from the Chinese hamster and various CHO cell lines [34]. The availability of CHO genomes allows for in-depth interpretation of various omics datasets [35].

In CHO cells, numerous genomic variants are often found as continuous cell lines because they undergo a high number of divisions [36]. Furthermore, this cell line represents variation in the number of chromosomes per cell even within a clonal population [37]. Due to the inherent genomic instability of CHO cell lines, optimization methods including suspension-adapted growth using different media compositions and many genomic modifications such as single-nucleotide variants (SNVs), indels, and other structural variations (SVs) have been improved for this cell line. Genetic variants, SNVs and SVs, are less likely to change protein sequence when they are in coding or non-coding regions [11].

The first and the oldest approach to be recruited in genomics is Sanger sequencing. The second applicable technique is known as DNA microarray; an oligonucleotide probe distributed around a specific region of a genome that identifies the genomic sequence by hybridization. The third well-known strategy is next-generation sequencing (NGS), which uses DNA chips for sequencing [38].

Limited access to a dataset consisting of the genomic sequence of cell lines other than CHO-K1, such as CHO-S and DG44, prompted scientists to examine the genome of these cell lines. For example, characterization of the chromosomal status of CHO DG44 showed that this cell line contains 20 chromosomes with only seven normal ones, in contrast to CHO-K1 with 22 chromosomes in Cricetulus griseus (Chinese hamster) diploid cells [39]. These findings revealed the aneuploidy of CHO cells compare to euploid Chinese hamster chromosomes which confirmed by the findings of Deaven and Petersen [35]. The plasticity of CHO cells enables development of distinct phenotypes in response to diverse culture conditions and during the production of various products [40].

Recent research has provided insights into the genomic profiles of CHO cell lines with diverse phenotypes, revealing a high number of single-nucleotide polymorphisms (SNPs) and translocations [11]. Based on NGS and screening of six common CHO cell lines, > 3.7 million SNPs and 551,240 insertions or deletions (indels) shorter than 5 bp were identified. Variable copy numbers of genes have been reported due to four thousand two hundred and forty-one genes located in duplicated regions. In at least one of the cell lines, 17 genes were completely missed [41]. The investigation of the genomic variation in nuclear genes associated with mitochondria and energy metabolism, as well as the mitochondrial genome, of 14 cell lines revealed variants enable reliable lineage tracing. Unique sequence variations are observed in cell lines adapted to grow in protein-free media, enriched in signaling pathways, or mitogen-activated protein kinase 3. High-producing cell lines bear unique mutations in nicotinamide adenine dinucleotide (NADH) dehydrogenase (ND2 and ND4) and peroxisomal acyl-CoA synthetase (ACSL4), which are involved in lipid metabolism [42].

Conventional genetic engineering methods mostly integrate interesting genetic material into the host genome randomly. This can destroy or amplify other genes within the genome. Recently, new technologies have been developed that insert the gene of interest at specific locations within a genome. Recently, a crucial advance in CHO cell engineering using genome-editing strategies has been made. The publication of the well-annotated CHO genome sequences in 2011 opened a new window in the use of highly efficient genome-editing tools and revolutionized the capabilities of CHOs. This revolution has continued with the advent of online genome-editing tools such as CRISPy to aid gRNA design for CRISPR-based editing in CHO cells [43]. This software was used to improve cell growth rate and culture longevity by targeting functional genes involved in proliferation, apoptosis, and lactate production. However, this online tool was developed based on CHOgenome.org data derived from a serum-dependent adherent CHO-K1 cell sequence, but this genetic reference may differ significantly from the other CHO-derived cell lines [7].

Genomics has become a central and cohesive discipline of biomedical research that will provide a valuable service to mankind in the not-too-distant future. The major benefits to be realized from genomics are in the area of health, such as genetic tests, gene therapy, pharmacogenomics, disease prevention, diagnosis, and treatment. To function properly, obtaining recent information on the sequences of various CHO variants is necessary, but there is no specific database available. Currently, there are a few commercial packages available, but their primary drawbacks include the fact that they are typically pricey, less adaptable, and need the use of proprietary databases.

5 Transcriptomics and CRISPR and CHO cell development

Transcriptomics is the study of the transcriptome, which is defined as the complete set of all RNA molecules that are expressed in a cell, tissue, or organism [44, 45]. Transcriptomics encompasses everything related to RNAs, including expression levels, locations, functions, trafficking, and degradation. In addition, it consists of the transcript structure and the genes from which they are derived, and also their transcription start sites, 5’ and 3’ regulatory sequences, splice sites, and the pattern of post-transcriptional modifications [46]. All types of transcripts such as messenger RNAs (mRNAs), long non-coding RNAs (lncRNAs) and micro-RNAs (miRNAs) are covered by transcriptomics.

The accessibility of the CHO-K1 cell line genome discovered by NGS has expanded the transcriptomic study of CHO cell lines [41]. RNA-Seq is a highly efficient sequencing method used for transcriptome assessment. This method has a clear efficiency compared to existing approaches like microarrays (which are based on hybridization). With no prior knowledge of a genome sequence, RNA-seq offers high sequencing depth and sensitivity to detect sequence variants [47]. RNA-seq has been used to characterize alternative splicing alterations of CHO cell mRNAs [48] and also for annotation of non-coding RNA molecules such as miRNAs [49] and lncRNAs [50]. The analysis of the transcriptome of recombinant CHO (rCHO) cells using RNA-seq leads to the comparison of gene expression profiles between different cell lines with different productivities. It also identifies suitable target genes that may offer higher specific productivity [51].

Single-cell RNA sequencing (scRNA-seq) was invented to overcome the inherent heterogeneity of biological systems and improve the production of consistent and safe therapeutics. This technology was used to obtain more than 3800 gene expression profiles from a clonally derived CHO cell line that survived production instability. The results of this study demonstrated lower levels of heavy chain transcripts than light chain of antibodies across the population [52].

It has now become possible to perform productivity studies using RNA sequencing and specialized PCR arrays using next-generation transcriptomes because of the availability of the database. These databases were first made available in May 2016 [53].

The Gene Expression Omnibus database (GEO; http://www.ncbi.nlm.nih.gov/geo) is an international public repository that provides robust, diverse information [54]. These databases include high-throughput functional genomic data, well-annotated data stores from the research community, and easy-to-use mechanisms that allow users to query, find, review, and download studies and gene expression profiles of interest. Various types of research related to CHO transcriptomics are available through the recruitment of this database. Searching for Chinese hamster ovary as a keyword in this dataset returned 656 results. Analysis of these results provides access to valuable research examining all aspects of CHO gene expression and regulation. For example, in one of these studies, the transcription starts sites (TSSs) of 15,308 CHO genes and 4478 non-coding RNAs were identified using GRO-seq, 5GRO-seq, csRNA-seq, ribosomal RNA-depleted RNA-seq, and ATAC-seq [55]. To demonstrate the precision and functionality of their revised TSS annotation, the resting Mgat3 (1,4-mannosylglycoprotein 4Nacetylglucosaminyltransferase) was activated in CHO cells by a CRISPRa based on a revised TSS [55].

Recently, small non-coding RNAs such as miRNAs have emerged as fascinating new tools for CHO cell engineering [56]. This interest in miRNAs stems from their effects on multiple critical cellular phenotypes simultaneously without adding translational burden to the engineered host cell [57]. miRNAs endogenously regulate post-transcriptional modification and gene expression in eukaryotic cells in a highly conserved manner across species [58]. They also play critical roles in the regulation of nearly all cell pathways, including growth, apoptosis, differentiation, and development [59]. These traits introduce miRNAs as promising new targets for cell line engineering to generate biopharmaceutical cell factories. Therefore, overexpression or silencing of specific miRNAs is a robust strategy to manipulate CHO cell phenotypes to achieve higher yield and productivity. Examination of miRNA expression profiles of a CHO cell line with high versus low monoclonal antibody (mAb) productivity under controlled fed-batch conditions compared to DG44 cell profiles identified 89 miRNAs that showed differential expression under different conditions. Through these miRNAs, the functional experiment was performed on 19 successfully validated miRNAs. Overexpression of these miRNAs demonstrated the effect on at least one process relevant to proliferation, apoptosis, necrosis, or specific productivity [60].

In this context, the findings of Nadja Raab and her colleagues [61] illustrate the potential of miR-744 to reduce productivity. CRISPR/Cas9-mediated miR-744 targeting showed a significant increase in productivity (190–311 mg/L) compared to non-targeting CHO cells (156 mg/L) [61]. Furthermore, disruption of miR-27b resulted in improved viability in the late stages of batch and fed-batch cultures. These results indicate the potential of miRNA silencing for cell line engineering [62].

Untranslated transcripts such as lncRNA, which covers about 97–98% of the entire sequence in CHO cells [63], have a massive impact on the phenotype. This unraveled the need for a better understanding of the function of the non-protein-coding genome in the context of recombinant protein production and bioprocessing. For this purpose, CRISPR/AsCpf1 was used to generate a library targeting process-related coding and lncRNAs [64].

Our ability to understand the transcriptome as a whole can be improved by analyzing transcription patterns. Transcriptomics analysis gathers information about how gene products are made, how gene expression is controlled, and how the transcriptome works. With the help of various innovative transcriptomics methods, we can better understand the transcription and productivity of CHO cells. Due to the development of high-throughput technologies, transcriptome data sets are being produced at an unprecedented rate. With access to a mapped genome and the availability of the transcriptome, new prospects for cell line modification have just become available thanks to genome-editing tools. Utilizing these methods can facilitate rational genome engineering design.

6 Proteomics and CRISPR system in CHO cell promotion

Since the ratio of amounts of mRNA and protein is not always linear, the findings of genomics and transcriptomics were supplemented by complementary proteomic studies. Proteome refers to any protein in cells, tissues, or biological samples at specific developmental stages. Proteomics is the identification of thousands of proteins involved in cell growth, cell death, and protein processing, glycosylation, and cell metabolism. Molecular proteomics, structural proteomics, and protein-protein interaction research are the different ways to study the proteome [65].

Proteins and post-translational modifications are quickly identified using mass spectrometry (MS). Operating at high speed and sensitivity, MS can now separate, identify, and quantify complex mixtures of proteins relative to one another, aiding in studying protein expression [66]. Through the use of MS, proteomics has made significant contributions to the relatively efficient isolation of proteomes from clones, the study of protein pathways, the evaluation of secreted proteins, and important enzymatic pathways in the production of biological proteins. Protein databases commonly used as search engines include MASCOT (www.matrixscience.com), SEQUEST (www.thermo.com), Phenyx (www.phenyx-ms.com), Spectrum Mill (www.agilent.com), and Tandem (www.thegpm.org/TANDEM). In addition to multiple proteomics technologies, TagRecon and MyriMatch were used to elucidate the proteome of CHO cells [67].

Proteomic technologies have been used for the quantification of different proteins in high-producing and low-producing CHO cell lines. This evaluation enables the identification of protein factors significantly affect protein productivity. In addition, genome-editing technologies serve as an essential part of the maturation of proteomics. To begin with, CRISPR-Cas is used in various sophisticated proteomics approaches. In addition, a growing number of biological models, produced by genome editing, are being effectively explored by biological MS.

For instance, using comparative proteomics to investigate three CHO sublines led to the identification of cell growth and protein expression regulators. The results of this study proposed robust engineering strategies to optimize CHO host cells. In cell growth, the overexpression of anti-apoptotic genes such as bcl-2 [68] or bcl-xL [69] and silencing of Bak1 and Bax [70] has been tried to enhance the viability of CHO cells as well as protein production. Meanwhile, gene knockout of caspase genes such as caspase 7 [71] and also optimization of culture condition has also been used to induce G1 and G2 phases arrest during the cell cycle to improve cellular productivity [72]. Filamine and vimentin as cytoskeletal proteins could be new targets for CHO cell engineering due to their diverse expression in different CHO cell lines. They play a central role in maintaining cell shape, intracellular transport and participating in the formation of mitotic spindles involved in cell division, as well as in regulating the protein translation process [73].

Several strategies have been used to improve protein expression, including optimization of production process parameters, cell engineering, and vector optimization. Cell engineering is a robust tool for increasing protein production, but gene manipulation requires an in-depth understanding of transcriptional regulation, translation, and post-translational modification of the host cell. The results demonstrate that inducible expression of B-lymphocyte-induced maturation protein-1 (Blimp1), a transcriptional activator, increases protein secretion. To this end, the usage of a CRISPR/Cas9-based recombinase-mediated cassette exchange landing pad platform in CHO cells to express Blimp1B increased the yield of recombinant protein production threefold to fourfold [74]. Furthermore, flanking an artificial zinc finger transcription factor in CHO cells increased cell productivity by up to tenfold [75]. In this context, overexpression of the E2F-1 transcription factor resulted in a 20% increase in cell viability [76]. It is currently being established that upregulation of transcription regulators such as snrnp200 and prpf8 could be used to improve transcription efficiency.

Recent results from proteomics indicate that overexpression of translation initiation factors, including 40 S ribosomal protein and 60 S ribosomal protein, can improve translation and productivity. It was also found that the elongation factors are not a good choice for host cell engineering due to the high-level expression of these factors in various CHO sub lineages. Engineering post-translational modifications such as overexpression of glycosyltransferase can be used to improve protein quality in CHO cells. The quality of recombinant proteins could also be improved by upregulating the expression of post-glycosylation enzymes [77].

Using safe harbor loci such as Hipp11 (H11) and C12 or f35 locus for CRISPR/Cas9-mediated gene knock-in is a powerful strategy to probe proteomic data on gene overexpression or biopharmaceutical production in CHO cells [78, 79]. Furthermore, it is shown that CRISPRa can target changes of CHO cells for desired phenotypes and wake up the dormant genes [80].

Taken together, the focus on transcriptional regulation, translation, and post-translational modifications led to the introduction of numerous proteomics techniques and databases. The overexpression of anti-apoptotic proteins and translation initiation factors as well as the expression of transcription factors led to an increase in productivity. Furthermore, silencing pro-apoptotic proteins using the CRISPR system increased the yield of recombinant proteins.

Proteomic datasets can provide a wealth of information about bioproduction platforms, but their value can be enhanced by integrating them with multiple omics technologies. Changes in protein expression alone may not always correspond to increased enzyme activity or flux in specific metabolic pathways, as mutations in the clone being studied can affect enzyme activity. Therefore, proteomics datasets are more meaningful when combined with other omics data, such as metabolomics, to provide a more comprehensive metabolic context and estimate flux between metabolites.

7 Epigenomics and CRISPR system in CHO promotion

Epigenomics is an area of study related to genome sequence and the knowledge of how methylation and chromatin modification spread common variants across the genome. The field of epigenetics is the study of how genes are expressed in cells while the genetic code remains largely unaltered and significant, preserving the integrity of these codes. This occurs through the covalent changes to DNA, RNA, histones, or tiny non-coding RNAs [81]. The science of epigenetics has identified a number of enzymes and mechanisms that control expression via methylation and chromatin remodeling in response to physiological or environmental cues [82].

DNA methylation and heterochromatin modification are two of the most important epigenetic processes thought to exert epigenomic regulation and can either repress or enhance transcription. Changes in chromatin structure and the activity of the transcription machinery are two examples of these modifications [83]. For histone modifications, methylation and acetylation are just two of the many changes that can affect histone tails. These modifications affect other regulatory proteins by allowing or preventing access to chromatin, resulting in a change in gene expression [84]. Recent advances in epigenetics have shown that by inserting additional DNA sequences that act as insulators between the promoter and enhancer regions, it is possible to stop and limit the expression of associated genes involved in acetylation [41].

As long as methylation sites on cytosine residues were palindromic and a maintenance methyltransferase was able to bind to hemimethylated DNA, DNA methylation on cytosine residues could act as a gene regulatory epigenetic mark that could be passed from generation to generation [85]. DNMT1, DNMT3A, DNMT3B, and DNMT3L are involved enzymes in DNA methylation. DNMTs are believed to work by burying their target cytosine in the active site, increasing the negative charge on the cytosine 5 atom and adding a methyl group from the donor molecule, S-adenosylmethionine, to form a covalent bond with it [86].

ChIP-seq is a popular technique for identifying histone changes on the epigenome, although it has several limitations of its own such as being costly. ChIP-seq consists of NGS and immunoprecipitation to measure the distribution of chromatin proteins in genome-wide as well as their post-translational modifications [87]. A method known as single-cell combinatory indexed Hi-C (sciHi-C) was also developed to study the connections of chromatin across the genome in tens of thousands of different cells [88].

A major development in CHO and epigenetics research occurred in 2015 as a result of the publication of the first papers on genome-wide DNA methylation studies in CHO cells [86, 89]. The epigenetic study in CHO cells allows researchers to study the changes induced by looking at the DNA methylation process. The use of genome editing in CHO cells, including gene silencing, will help to better understand CHO cell epigenetics and transcription, although comprehensive and advanced research on the global dynamics of epigenetics in CHO cells has not yet been conducted.

Epigenetic processes have been shown to be associated with transgenic expression patterns in the context of the CHO-producing clone cell line. Genetic sequencing and epigenetics are likely to have a major impact on the long-term stability of CHO-producing clones. By comparing different CHO types, researchers can determine which genomic regions tolerate transgene integration more effectively, thus locating recombination sites in domains expected to remain stable after recombination. The second approach involves measuring methylation and histone patterns as a screening tool to select clones early in the process [90]. The study of CHO genomic alterations, including CHO epigenetics, is still in its infancy and has not progressed significantly over time [91].

In their initial attempt to map the epigenome at single-nucleotide resolution, Vipperman et al. were successful in determining the methylation landscape of CHO DP-12 cells producing IgG using bisulfite sequencing and custom-made CHO-specific microarrays [92]. The general hypomethylation of DP-12 compared to most mammalian mitochondrial genomes and the hypermethylation of CpG islands in promoter regions are consistent with the study’s conclusion that there is a substantial association between epigenome and phenotype. In addition, functional clusters discovered by methylation profiling may potentially be associated with gene expression patterns [93].

Feichtinger et al. studied the genetic and epigenetic development of six CHO-K1-derived cell lines with adaptation to different medium formulations, adaptation to suspension culture, long-term culture, and cell sorting and subcloning. The results showed that the pattern of DNA methylation in CHO cells varies during adaptation to new environments, but remains relatively stable under these conditions when maintained [86].

Traditional cohort studies have examined the correlation between histone modifications and gene expression following sudden environmental changes, with scientists concluding that unlike histone modifications, which can be reversed, DNA methylation acts as a long-term adaptation mechanism for the cell [86]. In addition, epigenetic strategies can be employed to control gene expression patterns in CHO cells, occasionally leading to genetic changes not previously observed in the general population [50]. With the growth of science in the field of CRISPR-based sections in epigenome-editing technologies in recent years, these technologies are poised to become indispensable tools for studying the behavior of epigenetic markers and effectors in cancer and other human disorders.

It has been shown that the use of CRISPRa in CHO cells enables the activation of even silenced genes. Therefore, this gene up regulator tool is particularly robust, fast, and inexpensive to overexpress hard-to-make proteins or to get interesting phenotypes in the cells. For example, since synthesizing large numbers of genes for overexpression is not cost-effective, recruiting a library of gRNAs to contribute to CRISPRa facilitates the simultaneous screening of thousands of different endogenous genes [93, 94]. The results of these screens lead to the identification of the genes involved in the formation of a phenotype of interest. In this context, Karottki and her colleagues awakened silent glycosyltransferases such as Mgat3 and beta-galactoside alpha-2,6-sialyltransferase 1 (St6gal1) in CHO cells [80]. Furthermore, perturbations in Dnmt3a, which encodes the proteins involved in DNA methyltransferases, resulted in stable high transgene expression under the cytomegalovirus promoter expression with a low methylation rate in the promoter region and global DNA. In contrast, expression of the transgene under the EF1 promoter in Dnmt3a-deficient CHO cells resulted in low expression and high levels of DNA methylation [95].

Two factors may have a substantial impact on the long-term cell line stability of CHO-producing clones when genetic sequencing and a complete understanding of epigenetics are used. First, it is feasible to identify specific genomic areas in genomic studies that are more likely to tolerate silence or significant change, resulting in more stable transgene integration. A second aspect of the clone selection process may include screening techniques based on histone and methylation patterns. It has been demonstrated that transgenic expression patterns and epigenetic processes have a correlation [96].

In summary, epigenetic patterns have crucial implications for CHO cell productivity, leading the scientist to study epigenomics in depth. CRISPRa as a fast, simple, and cost-effective tool could be used to awaken silenced genes and produce interested phenotypes of CHO cells with high yield of bioproducts.

8 Metabolomics and CRISPR system in CHO cell progression

The field of metabolomics focuses on the study of sugar molecules as well as amino acids, nucleosides, amines, fatty acids, and their metabolites in living organisms [97]. The basis of metabolomics is the systematic measurement and analysis of low-molecular biological compounds, especially those with a molecular weight of less than 1,500 Daltons. Therefore, metabolomics can be used as a complementary approach to proteomics to study proteins. In addition, metabolomics data provide new insights into the metabolic, regulatory, and signaling activities of cells or tissues based on the relationship between genetic and proteomic variation and functional properties [98].

Today, CHO research is simplified with the availability of restriction-based models from Chinese hamsters and CHO cell lines, making it easier to study CHO and its many variations such as CHO-S, CHO-K1, CHODG44, and iCHOv1. A helpful website for retrieving and downloading these models was recently provided in CHOmine (https://chomine.boku.ac.at). Chinese hamster and CHO cell genome-wide data, as well as gene and protein data, are available at this website [99]. Genome-scale models, accessible at http://CHO.sf.net, have been developed specifically for CHO cells and allow for the identification of growth-inhibiting variables, making this site an excellent tool for an overview of CHO metabolites genomic sequence [100].

Along with omics, metabolomics is an area that focuses on how biochemical reactions affect metabolites. Metabolism is an emerging field and one of the fastest growing fields today. A metabolome database, METLIN, and MetaboLights store metadata associated with biomarkers identified by chromatography [101]. In addition to those for transcriptome, metabolome, and liquid chromatography–mass spectrometry (LC–MS) proteome analysis, several other software packages have been created to examine different types of data. Just a handful of these programs include MultiAlign, INMEX, and Paintomics. The integrated transcriptome and metabolome data are displayed by Paintomics via a web-based application [101]. Using a web-based INMEX program, metabolomics and gene expression datasets from different types of research are evaluated [102]. The MultiAlign software package includes multiple proteomic and metabolomic data studies from various LC–MS feature maps [103].

A variety of techniques have been used to measure biological metabolites and metabolomics. Recent advances in MS and nuclear magnetic resonance spectroscopy have improved the sensitivity, resolution, and accuracy of metabolite measurement [104]. The metabolomics approach uses a range of analytical tools including LC–MS, vibrational spectroscopy such as Fourier transform infrared and GC–MS (gas chromatography coupled to mass spectrometry). These tools enable the identification of a wide range of metabolites and a comprehensive knowledge of the metabolic processes occurring in a cell and the chemical structure of these metabolites [105].

A deeper functional analysis of CHO cell metabolic profiles may provide a new perspective on CHO cell biology and, as a result, improve product yield and quality. The 2012–2013 Global Metabolite Survey showed that CHO-K1 cultures could be characterized by their medium type and growth stage. To characterize the metabolic state of CHO cells with mAb-producing properties, Chong et al. conducted the first metabolomics studies in 2012 [106]. They used MS and LC–MS to identify and determine metabolites associated with mAb productivity. The result of this study was the production of monoclonal antibodies with high specificity [107].

It is important to note that overexpression of the anti-apoptotic genes E1B-19 K, Aven, and silencing of XIAP reduced lactate accumulation during the early exponential phase [108]. According to Cost et al. [109], using ZFN to destroy the Bak and Bax proteins led to the generation of apoptosis-resistant CHO cells. Improved protein production efficiency through metabolic engineering has identified several potential targets in central metabolism, including PFK, F26BP, PKM2, PGAM, 3PG, 3PGDH, and 6PGD [109].

A thorough understanding of CHO cell metabolism and metabolic engineering leads to the establishment of ideal medium compositions that allow sustained growth without accumulation of harmful metabolites. Lactate and ammonia (high osmolality) are examples of metabolic waste products that accumulate in CHO cells. Several strategies can inhibit lactate production, including expression of the fructose transporter (GLUT5) or the use of siRNA-mediated inhibition of lactate dehydrogenase-A. Several metabolic enzymes have been targeted to enhance transgene expression in CHO cells when the medium formulations were ideal, e.g., B. glutamine synthetase and dihydrofolate reductase. Furthermore, CHO cells lacking Gls (encoding glutamine synthetase) and dihydrofolate reductase were used as selection markers to select CHO cell clones with high productivity [110,111,112].

The investigation of the fed-batch CHO cell culture was commenced as early as 2012 by Selvarso et al. [113]. Based on the metabolomics study, in silico modeling combined with genomics and experimental analysis of CHO data in mouse models enabled the accurate prediction of CHO flux [113]. Known growth drivers such as oxidative stress and the decline in lipid metabolites could be identified in silico modeling. According to experiments using LC–MS, accumulation of acetylphenylalanine and dimethylarginine in CHO-fed-batch stationary-phase cultures impaired cell growth [114].

Studies have shown that MS has been used to study flux and metabolite accumulation analysis for CHO cellular metabolism using isotopic tracers. In summary, it shows that in addition to glycolysis and the tricarboxylic acid cycle, CHO cell cultures produce by-products or intermediates from a number of other metabolic pathways such as: B. amino acid, nucleotide, lipid, and redox pathways. In addition, the anti-apoptotic gene bcl2 has been shown to have an anti-apoptotic effect via lactate synthesis and may have an impact on how much pyruvate enters the TCA.

Recently, a comprehensive pooled CRISPR screen was performed in CHO cells using 16,000 gRNAs against ~ 2500 metabolic enzymes and regulators. The results of this screen demonstrated the glutamine response network in CHO cells. Glutamine is particularly important in the flow of the TCA cycle, but its downstream toxic metabolite called ammonia can accumulate. This screen revealed that Abhd11 (Abhydrolase Domain Containing 11) is an orphan glutamine-responsive gene. Disruption of this poorly characterized lipase can result in increased growth in glutamine-depleted media due to altered regulation of the TCA cycle [115]. Figure 2 shows the overview of the metabolomics pathway of Abhd11.

Fig. 2
figure 2

Mechanism of action for wild type (WT) and Abhd11 knockout (KO) cells grown in plus or minus glutamine. A Alpha-KG is unable to undergo conversion into glutamine and glutamate and instead must first be converted to glutamine and glutamate through the Abdh11 gene in order to enter the TCA cycle. In Abdh11 knockouts, alpha-KG cannot enter the TCA cycle without the presence of glutamine, as it is only able to be converted to glutamine and glutamate through this gene. B The Abdh11 gene is responsible for converting glutamine to glutamate, which can then be further converted to alpha-KG and enter the TCA cycle. However, in Abdh11 knockouts, an excess of glutamine is produced but cannot be converted into alpha-KG because glutamate cannot be produced, resulting in alpha-KG being unable to enter the TCA cycle

In another experiment, the secretion of growth-inhibiting metabolic by-products from amino acid degradation (such as lactate and ammonium) was reduced using the CRISPR system. To this end, a metabolic network reconstruction of amino acid catabolism was targeted by Hpd and Gad2 disruptions. The results of this study illuminate those silencing individual catabolic genes led to a reduction in lactate and ammonium accumulation, which resulted in an increase in growth rate (up to 19%) and viable cell density integral (higher than 50%) [116].

Altogether, metabolomics is a powerful tool for unraveling the reactions occur in the cell under certain conditions and identifies toxic metabolites and growth-inhibiting agents. The metabolomics screening mediated by CRISPR system provides a critical targeted platform to thoroughly study genes involved in any metabolic trait and represent novel regulators of metabolism. Although it offers a thorough description of high-producing clones, obtaining metabolomics data for each cell line is a costly and time-consuming process. Conducting high-throughput screening for a select group of important targets could alleviate this burden, simplifying the process and allowing for the creation of optimized medium and feeds tailored to meet the specific needs of each clone.

9 Glycomics and CRISPR system in CHO cell progression

Glycomics is the comprehensive study of all glycan structures in cell type or organism and is a subset of glycobiology. A variety of glycans exist outside of cells that attach to growth factors to support signaling pathways, stimulate and activate them, and support cell-to-cell communication. They are joined to lipids or proteins and form glycoproteins and glycolipids. Glycoproteins undergo one of the most significant post-translational modifications, known as N-glycosylation.

Glycosylation impacts a large number of protein attributes ranging from folding, stability, solubility, and protein-protein interactions. A variety of factors are known to influence protein glycosylation including the amino acid sequence of protein, the host cell, and cellular growth [117].

The number of joined carbohydrates (neutral sugars, amino sugars, and sialic acid), the structure of the carbohydrate chains, the pattern of the oligosaccharides (antennary profile), and the locations of glycosylation on the polypeptide are all important factors that affect the proteins’ biological functions. The glycosylation process requires the key enzymes known as GTs (glycosyltransferase genes), which are also responsible for glycan assembly in a variety of macromolecules found in mammalian cells, such as glycoproteins, glycolipids, and proteoglycans [118].

Various techniques have been invented for determining the structure of glycan compounds such as MS. This technique can be used directly or in combination with LC. It is possible to identify intact glycoproteins, glycoconjugate macro-heterogeneities as well as micro-heterogeneities by LC–MS/MS methods [119, 120]. Large amounts of glycomics data have been analyzed by glycobioinformatics databases due to the quick development of glycosylation analytical methods. GlycoWorkbench introduced by EUROCarbDB is a free software for the interpretation of MS derived glycan data [121]. In addition, GlycoBase and UniCarb-DB as well as O-linked glycans database can be used for prediction of monosaccharide composition.

Editing of genes involved in glycosylation of mammalian cells could contribute enormously to displaying significant functions of glycans engaged in biopharmaceuticals development. To this end, deletion of FUT8 gene in CHO cells led to 90% improvement in antibody-dependent cell cytotoxicity of fucosylated IgG [122].

Moreover, inactivation of the gene encoding mannosyl (Alpha-1,3-)-glycoprotein beta-1,2-N-acetylglucosaminyltransferase (MGAT1) also known as GnTI improves the binding potential of HIV-1 vaccine to prototypic glycan-dependent bN-mAbs. The CHO cell deficient MGAT1 exhibits the limitation in glycosylation to early intermediates in the N-linked glycosylation pathway without any side effect on the cell cycle or cell culture in high densities [123].

COSMC as an essential ER chaperone working in protein O-glycosylation pathway has been demonstrated to be disrupted by CRISPR Cas9 gene editing in CHO cells [124]. The product of Cosmc known as C1GLAT1 is required for catalyzing the second step in elongation of O-glycan by adding β3Gal to the first GalNAc residues previously attached to the protein backbone. Targeting the Cosmc gene resulted in the trimming the O-glycosylation elongation entirely and also enabling the synthesis of O-glycoproteins with truncated homogenous Tn (GalNAcα) O-glycans. As the state-of-the-art, Cosmc deficient CHO cell line can be used for production of O-glycoproteins with augmented immunogenicity. In addition, since Tn glycoform is an extensively expressed glycoforms in cancer, knockout of Cosmc may improve the ability of CHO cell line for production of O-glycoproteins as cancer vaccines. Based on the simple cell technique, CHO cells have a GalNAc-type O-glycoproteome.

It is reported that there are various differences between the glycosylation in human and CHO cells. Some of these variations include the attachment of bisecting N-acetylglucosamine (by Mgat3) and the alpha 2,6 sialic acid (by St6gal1) in human glycoproteins but absent in CHO glycoproteins [125, 126]. These kind of glycosylations besides the increasing the drug activity exert no adverse event on immunogenic responses. Hence, to boost transcription of Mgat3 [55] and St6gal1, researchers targeted these glycosyltransferase genes in CHO cells using CRISPRa. They later observed that these two genes formed the required glycan structures to improve the functional activity of glycoproteins [80].

Glycosylation profiles of monoclonal antibodies and glycoproteins produced in CHO cells have become increasingly crucial not only for developing new biotherapeutics but also for manufacturers of biosimilars and biobetters. An integrated omics-based approach can provide valuable insight into how upstream cell culture parameters, such as cell lineage, culture process conditions, and culture medium and supplements, impact the glycosylation of biotherapeutic glycoproteins. By gaining a specific understanding and developing targeted strategies, it is possible to create customized media and supplements, novel cell lines, and predictive in silico models that result in consistent glycoforms, ensuring the production of effective and high-quality biotherapeutics.

10 Future directions

High-throughput verification of experimental platforms is required to reveal gene functions in CHO cells. Innovative efforts include CHO genomics (https://CHOgenome.org/) [127, 128] and metabolomics (https://chomine.boku.ac.at) [99] platforms. However, the prediction of gene function based on networks is an active research area but limited in CHO cells investigation. We need additional data, easy access to tools and data, improved data analysis, and high-throughput verification from experiments to achieve a network-based gene function identification goal. Omics technologies, databases, and bioinformatics tools primarily provide information on candidate genes, biosynthetic pathways, proteins, master regulators, biological networks, and cross talk, especially on the CHO cell productivity.

The CRISPR-Cas9-mediated genome-editing system has fundamentally influenced gene function research and, ultimately, CHO cells improvement [30]. The CHO cell genome engineering approach has no ethical issues. CRISPR-Cas9 mutants are generated with greater efficiency and specificity than TALEN and ZFN. Hence, the CRISPR-Cas9-mediated genome-editing system has great potential for practical research. Various CRISPR-Cas9 platforms have been developed for CHO cell genome engineering but require advanced targets for specificity and efficiency.

Moreover, gene replacement and DNA part knock-in is a challenge [17]. A toolbox based on CRISPR-Cas9 has been established for gene repression and activation in CHO cell lines [8]. CRISPR-Cas9 could be adapted for new approaches, e.g., epigenomic regulation, chromatin imaging, and RNA cleavage [95, 129, 130]. Given its versatility, simplicity, efficiency, and flexibility, the future of functional genomics is likely to depend on the CRISPR-Cas9 system. Omics and CRISPR have provided a snapshot for improving an organism’s functioning and interactions at the cell and tissue level by depicting and measuring biomolecules.