Introduction

For three decades, mammalian cells have been used as the vehicle for the production of therapeutic proteins for the treatment of cancers, rheumatoid arthritis, and many congenic diseases. They have the capability of performing post-translational modifications, including complex tertiary structure formation through multiple disulfide bonds, glycosylation, and phosphorylation, which are important attributes of many therapeutic proteins. Much effort has been devoted to developing alternative host systems, including plants, insect cells, yeasts, and transgenic animals. Nonetheless, the pre-eminent role of mammalian cells remains unsurpassed. During the same period, the productivity of mammalian cell culture processes has steadily increased by nearly two orders of magnitude. Advances in various aspects of process development, such as medium and feed design, vector design and screening strategies for identification of high producing clones, have all contributed to this increase in productivity. This review focuses on advances related to cell line development. A high-producing cell line nowadays can secrete IgG-based protein products at a rate rivalling that of professional secretors in our body (Seth et al. 2007). The method of generating producing cell lines remains largely unchanged over the past three decades. However, the host cells used to derive the producing cell line are often pre-adapted and selected for better traits suitable for industrial manufacturing (Sinacore et al. 1996). The methotrexate (MTX)—dihydrofolate reductase (DHFR) systems and methionine sulfoximine (MSX)—glutamine synthetase (GS) systems have been used to increase the copy numbers of the product genes, consistently yielding very high producing cells. The methods of identifying and isolating higher producing candidate cells have also been refined to increase the chance of isolating them.

In recent years, a torrent of transformative science and technology has been gathering force that can cause changes in the way future cell lines are derived. Genome science and systems analysis have given us new ways of understanding cell physiology, while advances in genome engineering have begun to endow us with the capability of engineering cells with better precision and predictability. This article aims to summarize the new knowledge on cell line development acquired in the past few years and discuss how the insights can be interjected with new tools to forge a new era of cell line development for biopharmaceutical production.

Vectors and modulation of transgene expression

For high productivity, a high transcript level of the transgene is desirable. The expression level of the transgene is affected by the vector and its components as well as the loci of vector integration. Plasmid vectors have been the primary vehicle of delivering the product transgene into the host cells. Most industrial producers have acquired multiple copies of the transgenes through either transfection with a high dose of the plasmid or amplification of the copy number of the transgene after transfection. In addition, viral vectors have also gained interest due to their propensity to integrate into actively transcribed regions of the host cell’s genome, resulting in higher expression levels of the transgene than plasmid transfection (Oberbek et al. 2011).

The transcript level of the transgene is affected by the integration site although, to date, a systematic study on the impact of the integration sites on the transgene expression level is lacking. To minimize the influence of this position effect, cis-acting barrier elements, called insulators that block the propagation of a heterochromatin structure into neighboring euchromatin regions, have been incorporated into the expression vector to isolate the integrated transgene from nearby negative influences in the case where the vector is integrated into a condensed chromosomal region (Ghirlando et al. 2012). The use of such insulating elements significantly increases the number of producing clones and improves the recombinant protein titer (Girod et al. 2005; Boscolo et al. 2012; Hou et al. 2014).

Promoters commonly used in driving the expression of the transgenes are derived from viruses, mice and humans because of their availability. Promoters from Chinese hamster for use in CHO cells have also gained attention. These include a beta-actin promoter (Estes and Zhang 2014), cold-responsive promoters (Thaisuchat et al. 2011; Al-Fageeh and Smales 2013), and time-dynamic promoters (Le et al. 2013). These endogenous promoters offer the possibility of more predictable expression levels, and dynamic control of the transgene expression with various growth stages. In addition, synthetic promoters have been explored to modulate gene expression levels (Hartenbach and Fussenegger 2006; Grabherr et al. 2011; Brown et al. 2014). As the customized combinations of these regulatory motifs do not occur naturally, they tend to be less tissue-specific and species-restricted, and could possibly offer more stable and consistent gene expression than natural ones.

Host cells and engineering host cells

Three decades after the emergence of therapeutic proteins, the list of commonly used host cells for such products remains rather limited but include Chinese hamster ovary (CHO), mouse myeloma NS0 or SP2/0, baby hamster kidney (BHK), and human embryonic kidney (HEK 293) cell lines. The narrow spectra of host cell line choices reflects the mastery in using currently available ones and the reluctance in experimenting new lines out of concerns of the need of seeking regulatory approval for them in spite of differences in proteins expressed from cell lines derived from different species; for example, in their glycosylation capability. Glycans on proteins produced in CHO and NS0 cell lines are somewhat different from each other as well as from those naturally occurring in human proteins (Sheeley et al. 1997; Beck et al. 2008). The small difference in glycans has not been a major concern, as evidenced by the therapeutic use in humans of proteins produced using both the cell types.

Over the years, host cells have been adapted to desired manufacturing culture conditions, including chemically defined medium, suspension growth, and higher mechanical stresses. While the genetic mechanism of such adaptations is not well understood, the fact that many of the acquired traits after adaptation are inheritable, suggests that the root cause of these adapted traits is probably genetic or epigenetic alternations. There have also been efforts to genetically engineer host cells so that the same desirable traits would be carried to new cell lines for different products. With the emergence of biosimilars for which the manufacturers have several years to tailor the candidate producing cell lines to a particular product, it will be tempting to genetically engineer host cell lines for better growth characteristics and product quality such as glycosylation profiles.

For two decades, efforts have been devoted to genetically engineer host cells or produce cells with enhanced growth characteristics or to modulate the glycosylation pattern of the product. Table 1 summarizes reports of cell engineering for cells producing heterologous proteins. Modulating cellular metabolism, especially to reduce lactate production from glucose metabolism has been the goal of many efforts. Another area that has drawn significant effort in engineering cells is tackling the apoptosis pathway by both overexpressing anti-apoptosis genes and suppressing pro-apoptotic genes (reviewed by Krampe and Al-Rubeai 2010). In addition, autophagy is important in the bioprocessing context (reviewed by Kim et al. 2013) and may be an attractive target for cell engineering.

Table 1 Strategies used for cell engineering

Since most recombinant proteins expressed in animal cells are secreted, there has been keen interest to engineer cells to enhance their secretory pathway. The protein secretion pathway is complex, spanning over many cellular compartments, including ER, Golgi, and involves also vesicle trafficking and post-translational modifications. A cell’s capacity of protein secretion is influenced by its energy metabolism, gene expression and redox balance in addition to the secretory capacity (Seth et al. 2007). Such complexity was further illustrated by the drastically different capabilities of two host cell lines to express two antibody molecules at high levels even though they both expressed another antibody equally well (Hu et al. 2013). It is thus not surprising that reports of genetic manipulation of genes involved in the secretory pathways, such as BiP, PDI, or transcription factor Xbp-1 did not give consistent results Borth et al. (2005), and reviewed by Khan and Schröder (2008).

In comparison, cell engineering targeted at a specific reaction step in the glycosylation pathway has had considerable successes. Cell engineering approaches have been used to make the glycans produced in CHO cells resemble the human glycoform by expressing a heterologous 2,6-sialyl transferase, which is silenced in CHO cells (Lee et al. 1989). The unfucosylated glycan in Asn297 of IgG heavy chain enhances antibody-dependent cellular cytotoxicity (Shields et al. 2002). Knock-out of endogenous fucosyl transferase resulted in the desired alterations in glycan profile (Lee et al. 1989; Yamane-Ohnuki et al. 2004). Using a prokaryotic enzyme to divert the metabolic flux from the synthesis of GDP-l-fucose to GDP-d-rhaborthmnose reduced the GDP-l-fucose substrate pool for the fucosylation reaction and led to reduced fucosylation (von Horsten et al. 2010). However, it is worth noting that glycosylation pathway is very complex. Even though the number of enzymes constituting a pathway is small, their wide range of glycan substrate specificity gives rise to hundreds of possible glycans (Hossler et al. 2006; Spahn and Lewis 2014). To steer the reaction to a particular glycan may not be easily accomplished by tackling a single enzyme alone.

Epigenetics and genome editing on cell line development

Epigenetic regulation

Epigenetic regulations, or the chemical modifications of DNA and histone proteins without DNA sequence alteration, are increasingly being implicated in the development of producing cell lines. Gene silencing has been shown to affect the nutritional requirements of the cells and the transgene expression. A key gene in cholesterol synthesis, Hsd17b7, is silenced (due to methylation in the CpG island upstream of its promoter) in NS0 cells, leading to cholesterol auxotrophy (Seth et al. 2006). Treatment using the demethylation drug 5′-azacytidine or expression of heterologous Hsd17b7 led to the reversal of this phenotype. The methylation of CpGs in the CMV promoter has been linked to decreased transgene expression and productivity (Yang et al. 2010; Kim et al. 2011; Osterlehner et al. 2011). Both 2,6-sialyl transferase and GS are present in the Chinese hamster genome and expressed in liver, but are silenced in CHO cells. Epigenetic intervention may present opportunities in endowing cells with characteristics favorable for industrial production.

Targeted genome editing

Advances in targeted genome editing in the past few years coupled with availability of genome sequencing allows increasing precision in executing desired genetic changes. Efforts will likely be directed towards identifying transgene integration sites that can lead to high expression levels and long-term expression stability. Replacement of the transgene in the genome of the existing high-producing cells with a new product gene may be increasingly used for quick access to a high producer of the new product (Moehle et al. 2007; Nehlsen et al. 2009; Chen et al. 2013). The impact of genome editing need not be limited to the insertion of product genes but can also be used to engineer host cells. The genes modulating the characteristics of producing cells can be targeted to specific sites, possibly under the regulation of a cellular regulatory circuit and be expressed in sync with cellular rhythm at the desired level (Le et al. 2013). Recombinase-Mediated Cassette Exchange (RMCE), one of the earliest targeted gene integration techniques (reviewed by Turan et al. 2013), is based on the replacement of gene cassettes flanked by two recombination target sites. More recently, artificial nucleases like zinc finger nucleases (ZFNs), CRISPR/Cas system and transcription-activator like effector nucleases (TALENs), which combine a DNA sequence-specific binding domain with a non-specific nuclease domain (Fig. 1), have alleviated the requirement of tagging the sites for site specific gene integration (reviewed by Gaj et al. 2013). ZFNs and TALENs employ different DNA recognition elements that also pose different complexity levels in design. ZFNs use a combination of zinc finger domains for DNA specificity, each of ~30 amino acids specifically binds to a 3-bp DNA sequence. The DNA sequence specificity arises from the specific combination of these domains (Maeder et al. 2008). In TALENs, DNA binding domains consist of 33-35 amino acid repeat units, each of which is specific to a single base pair. Several such units are combined to achieve specificity to a target DNA sequence (Zhang et al. 2011).

Fig. 1
figure 1

Targeted genome editing tools for cell engineering. a Gene activation, b gene repression, c gene deletion, d targeted integration of heterologous gene

The more recently developed CRISPR/Cas system, originally identified in bacteria, is a RNA-guided DNA endonuclease (Sander and Joung 2014). Sequence specificity is provided by including a specific guide sequence in the crRNA which binds to the target sequence in the genome and guides the Cas endonuclease to the target. All three systems introduce a double stranded or single stranded break (achieved by the nickase activity instead of the endonuclease activity of the nuclease) in a sequence specific manner, which greatly increases the frequency of homologous recombination at that location and facilitate the integration of the gene of interest. ZFNs have been used to rapidly generate a triple gene knock-out strain of CHO with GS−/−DHFR−/−Fut8−/− (Liu et al. 2010). Recently, the CRISPR/Cas9 technology was used to generate CHO lines with disruptions in COSMC and FUT8. However, off-target gene interruptions caused by the endonucleases poses some concerns for gene therapy applications (Kuscu et al. 2014). Such off-target effects can possibly be reduced by identifying unique target sites in the genome (Cho et al. 2014).

Evaluating and isolating high-producing cells

The increased versatility in performing genetic manipulations has played a major role in the generation of better cell lines in the past two decades. But this increased productivity is also attributed to the expanded capability in identifying and isolating clones that perform well in the manufacturing scales. Importantly, automation of many otherwise labor-intensive procedures in cell line isolation allows more clones to be evaluated and greatly improves the possibility of obtaining a high producer.

Assessing clonal quality and clonal stability

Upon the introduction and amplification of the transgene, single cell cloning is performed, and high producing cells with high productivity and robust growth characteristics are isolated. The selected clones must maintain their productivity and growth characteristics over a large number of generations in the product life cycle. Traditionally, single cell cloning, subsequent cell growth and productivity assessment, and cell stability studies are performed in well plates or flasks. Commercial instrumentations for simultaneous screening of a large number of clones are now standard features in industrial cell line development (Hou et al. 2014).

Microfluidic devices are being increasingly used to isolate single cells at micro- or nano-liter scales (reviewed by Mehling and Tay 2014). At very low cell concentrations used in single cell cloning in 96-well or 384-well plates, cell growth is often limited by the cell density. The small culture volume of the microfluidic device offers the advantage of faster accumulation of autocrine factors to facilitate cell growth during single cell cloning (Hansen et al. 2010). Product concentration assays for assessing productivity and cell isolation for further cultivation in the nano-liter scale microfluidic device are also possible (reviewed by Love et al. 2013). The system still faces some challenges for general users, including water evaporation and medium adsorption to the materials commonly used for fabricating the device (reviewed by Mehling and Tay 2014).

Assessment of process performance and scale-down models

Ideally, the selection of the final production clones should be based on the performance of those clones in culture conditions similar to the manufacturing setting. For cell lines used in a fedbatch manufacturing process, a performance evaluation based on fed-batch mode was reported to yield better outcomes compared to evaluation based on batch mode (Porter et al. 2010a, b). To simulate the manufacturing processes in stirred tanks, commercial miniscale multi-tube/flask systems or miniaturized cell culture systems with various mechanisms of mixing are available to increase the capacity of process performance evaluation (Girard et al. 2001; De Jesus et al. 2004). Some systems allow for control of pH and periodic nutrient feeding (Frison et al. 2002; Chen et al. 2009; Legmann et al. 2009; Hsu et al. 2012; Moses et al. 2012). To alleviate the need of a sensor and a pH control mechanism, a hydrogel based platform of slow release of neutralizing agent in situ for pH maintenance has been used (Pradhan et al. 2012).

To truly simulate manufacturing conditions, the culture system should provide equivalent time profiles of chemical milieu and physical environment (i.e., fluid dynamics and gas–liquid interfacial interactions) in the large scale. Most commercial miniscale systems provide mechanical mixing but do not address fluid dynamic issues. A scale-down model used 2 l reactors to simulate the hydrodynamic stresses at the large scale by oscillations between high and low agitation rates based on agitation power calculations (Sieck et al. 2013). Such studies along with the employment of transcriptome analysis across scales (Jayapal and Goudar 2014), or even extended to metabolomics studies, can provide much insight into the impact of scale up on cellular physiology.

Cell line development for biosimilars

Nearly three decades after the dawn of protein therapeutic biologics, patents for an increasing number of blockbuster biologics such as Remicade (infliximab), Humira (adalimumab) and Epogen (erythropoietin) are coming to the end of their life in various regions. This promises the opening of a large market and much wider availability of biosimilar drugs to patients.

The cell line development strategy for biosimilars differs somewhat from that for innovative biologics. The manufacturers of a biosimilar do not have access to the innovator’s original cell line and bioreactor/purification process. They must employ a different cell line and develop a different process while still “matching” the original protein product in its chemical and physical features, including purity and post-translational modification profile. For the innovator, rapid development and a short timeline to deliver the drug for clinical testing and regulatory approval is critical. But for the biosimilar manufacturers, a span of years is affordable to optimize the cell line and the process. This provides an opportunity to engineer the cell or tweak the process to acquire the desirable productivity and product characteristics, especially the specific glycosylation profiles.

Analytical tools to characterize the product become important in selecting potential cell line candidates and process conditions (Berkowitz et al. 2012). The advances in analytical technology can be a double-edged sword where, on one hand, the reference product can be characterized minutely and, on the other hand, these chemical characterizations can reveal many differences between the biosimilar molecule and the innovator product. Understanding the safety and efficacy impacts of the various attributes, and leveraging this knowledge to target cell line development efforts towards attributes that are critical, continues to challenge process development scientists.

Another challenge during biosimilar development is the choice of host cell line. Some innovator biologics may have been developed using older technologies, expression systems, and especially different cell lines. A range of new expression technologies have emerged since the approval of these early blockbusters. Current production platforms with optimized host cell lines different from innovator cell lines may be able to generate higher titers and improved purity. However, alternate host cell lines may also result in significant differences in the process impurity profiles or in the abundance of product variants.

Impact of genomic technology on cell line development

As industry moves into a post-genomic era and embraces systems biotechnology, we will see more use of genomic and transcriptomic tools in cell line development. The genome sequences of the host cell and the producing cell lines are becoming readily available (Xu et al. 2011; Brinkrolf et al. 2013; Lewis et al. 2013). Targeting the product gene to specific loci that give high transcript level and are less prone to silencing will likely become the norm in cell line construction. Directing genes conferring important trait(s) to specific sites on the genome wherein their expression level and dynamics are desirable may become a reality (Chen et al. 2013). As our understanding of the role of epigenetics on conferring the trait of hyperproductivity improves, targeted epigenetic intervention may also be feasible.

Genomic technology may also facilitate the screening of high producing cells. Through transcriptome and proteome analyses in the past decade, there have been attempts to develop a hyperproductivity gene set, or a collection of genes whose expression is “collectively” altered in high producing cells. Recently, single cell genomic methods for rtPCR and RNA seq based transcriptomic characterization have become available (reviewed by Astley and Al-Rubeai 2008; Eberwine et al. 2013; Nawy 2014). This will enable high throughput transcriptome profiling of producing cells using a very small number of cells. The transcriptome data can allow for faster and more reliable identification of desired clones using the hyperproductivity gene set (reviewed by Vishwanathan et al. 2014). The application of such technologies can help in obtaining next-generation host cells with desired traits. Such Next-Gen host cells can then increase the probability of obtaining hyper producers resulting in faster development of production cell lines (Fig. 2).

Fig. 2
figure 2

Cell line development using Next-Gen host cells. a Development of the Next-Generation host cell Robust host cells with desired characteristics can be obtained by engineering the original host cell line. This robust host cell can then be transfected with a transgene, either with or without targeted integration into desired genomic location, followed by extensive screening for desired traits. This can be expected to result in identification of recombinant clone with optimal traits of growth, quality, secretion and hence productivity. The transgene can then be deleted from this recombinant clone using targeted genome editing to result in a Next-Gen host cell. b Development of the Next-Generation producing cell line The Next-Gen host cell is transfected with the desired new transgene followed by cloning and screening to obtain the final producing cell line. The use of the Next-Gen host cell can potentially result in a high probability of obtaining hyper-producers and a faster cell line development cycle

Single cell based PCR, RNA-seq and genome sequencing may be used to examine the heterogeneity in gene expression or gene/genome sequence in a cell population and may be used to determine the homogeneity of producing cells. Determination of clonality (or the proof that a producing line is indeed originated from a “clone”) is important to manufacturers and regulatory agencies. By surveying “marker” sites on the genome that is unique to each clone, e.g., the transgene integration sites, through single cell PCR, one may be able to determine the clonality of a population of producing cells.

As genomic technology makes inroads into cell line development, the importance of systems biotechnology is also apparent. The availability of transcriptomic and proteomic data has allowed one to have more comprehensive and quantitative understanding of the make-up of metabolic, signalling, secretory, and other pathways. This has allowed for the development of better mathematical models to describe their dynamics. The integration of—omic data with these models may 1 day enable us to choose a production cell line based on the predicted metabolic behavior and product glycosylation pattern.

Concluding remarks

The productivity of protein therapeutics has increased by nearly two orders of magnitude in the past two decades. Efficient development of better cell lines has played a major role in this success. However, the basic method of cell line development remains largely similar over these years. With recent advances in genomics and the development of genome editing tools, we will likely see a transformation of cell line development in the near future. We will likely see increasing efforts on engineering host cells for better growth and metabolic characteristics and other hyperproductivity traits. The deployment of genomic technology and systems approaches to “design” producing cells will likely become the hallmark of the next generation of cell line development.