Introduction

Chinese hamster ovary (CHO) cells play an important role as hosts for the commercial-scale production of protein-based pharmaceuticals (Greber and Fussenegger 2007; Hacker et al. 2009; Omasa et al. 2010). Two subclones of CHO cells, CHO K1 (Kao and Puck 1968) and CHO DG44 (Urlaub and Chasin 1980), are the most widely used for both scientific research and industrial applications (Griffin et al. 2007; Wurm 2004). For instance, chemically defined media for serum-free cultivation of CHO cells (Kuwae et al. 2018; Pan et al. 2017), the target integration to CHO genome for cell line development (Kawabe et al. 2012), and identification of insulator element for stable expression from CHO genome (Takagi et al. 2017) have been developed.

Recently, following the development of next-generation sequencing (NGS) techniques and their impact on genome research, whole-genome sequencing has become more economical and faster. Several groups have reported NGS analyses of the genomic sequences of Chinese hamster and/or CHO cells (Xu et al. 2011; Lewis et al. 2013; Kaas et al. 2015; Feichtinger et al. 2016; Vishwanathan et al. 2016). According to the results of Xu et al. (2011), the estimated size of the CHO K1 genome is about 2.45 Gb and it was predicted to contain 24,383 genes. However, the genome of Chinese hamster includes various repeat sequences that are distributed on the same or different chromosomes (Ono and Sonta 2001). In human genome analysis, a clone-based physical map was an indispensable tool for the Human Genome Project because of the presence of numerous repeat sequences in the human genome (McPherson et al. 2001). The DNA sequencing information of the CHO genome should be coupled with physical chromosomal locations because of the instability of chromosomal structure. These locations should be obtained from BAC physical maps that can be derived from bacterial artificial chromosome (BAC) libraries and compared with the genomic sequences of Chinese hamster, various CHO cell lines, and other related species.

Previously, we constructed a genomic BAC library from the CHO DR1000L-4N cell genome, which provided fivefold coverage of this genome (Omasa et al. 2009). Based on this BAC library, we constructed a chromosomal physical map of the CHO DG44 cell line using 303 BAC clones and investigated the chromosome rearrangements between the two most widely used CHO cell lines, CHO K1 and CHO DG44 (Cao et al. 2012a). It was revealed that the two longest chromosomes did not feature significant rearrangements among CHO cells.

In this study, we determined the BAC end sequences (BESs) of 303 clones, which were used as BAC probes for a chromosomal physical map in the CHO DG44 cell line. Moreover, we compared these BESs and mouse genomic sequences. It has been reported that Chinese hamster cDNAs have high homology to mouse cDNAs (Melville et al. 2011; Wlaschin et al. 2005). However, cDNA analyses of the Chinese hamster have not revealed whether their genome have high homology to the genome of mouse. Using BAC-FISH and BESs, we confirmed that the genomic sequences of CHO DG44 cells have regions that are highly homologous to the mouse genome. This confirmation could contribute to basic genomic research in CHO cells and applications of cell engineering in knock in- and/or knock out- targeting sites such as genomic editing in CHO cells.

Materials and methods

Cell lines and culture conditions

The CHO DG44 (dhfr) (Urlaub and Chasin 1980) cell line, provided by Dr. L. Chasin of Columbia University, was used in this study. It was maintained in Iscove’s modified DMEM (IMDM) (Sigma-Aldrich, St. Louis, MO, USA) with 10% dialyzed fetal bovine serum (FBS) (SAFC Biosciences, Lenexa, KS, USA), 13.6 mg/L hypoxanthine (Yamasa, Choshi, Japan), and 2.42 mg/L thymidine (Yamasa). This cell line was cultivated at 37 °C in a humidified atmosphere containing 5% CO2. Details of other procedures were as previously described (Omasa et al. 2009; Cao et al. 2012a).

Construction of physical map using bacterial artificial chromosome fluorescence in situ hybridization (BAC-FISH)

Chromosome preparation and fluorescence in situ hybridization (FISH) were performed as described previously (Cao et al. 2012b; Omasa et al. 2009; Yoshikawa et al. 2000a, b). Briefly, 12 mL of hybridization solution consisting of 50% formamide, 2 × standard saline citrate, 10% dextran sulfate, and 8 mL of sonicated salmon sperm DNA (10 mg/mL) (BioDynamics Laboratory Inc., Tokyo, Japan) was used per slide. One microgram of BAC DNA was labeled with the Nick Translation Mix kit (Roche Diagnostics, Basel, Switzerland) at 15 °C for 8 h. Biotin-labeled and/or digoxigenin (DIG)-labeled probes were detected using avidin–fluorescein (Vector Laboratories, Inc., Burlingame, CA, USA) and/or anti-DIG-rhodamine Fab fragments (Roche Diagnostics). Chromosomes were counterstained with 4′,6-diamidino-2-phenylindole (DAPI) (VECTASHIELD; Vector Laboratories) and observed under an Axioskop 2 fluorescence microscope (Carl Zeiss, Jena, Germany). Photographs were taken with an AxioCam MRm CCD camera (Carl Zeiss). Image processing was performed using Adobe Photoshop CS3. ImageJ software (http://rsbweb.nih.gov/ij/) was used to analyze the chromosomal loci of the BAC clone probes and the positions of the centromeres on the chromosomes, which were expressed as FLpter values (relative distance from the short-arm telomere to the signal fractional length p-terminal; Lichter et al. 1990).

BAC DNA preparation and end sequencing

A previously constructed genomic BAC library from the gene-amplified CHO DR1000L-4N cells was used in this study (Omasa et al. 2009). BAC clones were cultured in 10 mL of Luria–Bertani broth medium containing chloramphenicol (12.5 mg/mL) at 37 °C for 16 h and harvested by centrifugation (6500×g, 5 min). BAC DNA was purified using a JETSTAR 2.0 Plasmid Purification Kit (Genomed GmbH, Löhne, Germany). The end sequencing of BAC DNA was performed with the forward sequencing primer 5′-CGCCAGGGTTTTCCCAGTCACGAC-3′ and the reverse sequencing primer 5′-CAGGAAACAGCTATGACC-3′ using the ABI PRISM BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems Inc., Foster City, CA, USA). Most of the determined BESs reported by our previous study (Cao et al. 2012a) were registered in DDBJ as genome survey sequences, and the most of these BAC clones were deposited to public cell bank (as RDB13444) in RIKEN BIOresource research center (Tsukuba, Japan).

Homology search with BESs and mouse genomic sequence

A sequence homology search with the mouse genome (Mus musculus strain C57BL/6J, GRCm38.p2, March 2013) was performed using the BLAST algorithm (http://blast.ncbi.nlm.nih.gov/), setting the threshold for a significant value (e-value) as < 10−2. In general, the threshold will be set as < 10−10 for homology search with genomic sequences. However, we set the threshold < 10−2 in this study, because the determined BESs length are short (less than 1100 bp) and it is expected that more homologous regions with mouse genomic sequence were obtained by the setting low threshold.

Results and discussion

CHO cells can act as hosts for the intense production of protein-based pharmaceuticals because they are able to strongly express many exogenous genes through a gene amplification system, such as a dihydrofolate reductase (DHFR) and glutamine synthetase (GS) system. To investigate the mechanism of the DHFR system, we constructed a gene-amplified CHO cell line using the DHFR system derived from the CHO DG44 cell line (Yoshikawa et al. 2000b). The constructed CHO DR1000L-4N cells have high copy numbers of DHFR and exogenous gene, and these genes are located in a specific region of the chromosome. To determine the genomic sequence of the gene-amplified region, we constructed a BAC library (122,281 clones) used by the CHO DR1000L-4N cells genome (Omasa et al. 2009). Then, we determined the sequences of DHFR and the exogenous gene; the results showed that the sequences consisted of a large inverted repeat (Omasa et al. 2009). We also selected a chromosome BAC-FISH marker in the CHO DG44 cell line to identify the particular chromosome by BAC-FISH (Omasa et al. 2009).

For Chinese hamster and the Chinese hamster-derived CHO cell line, less information on the genome sequence is available than in human and mouse. Thus, we constructed a chromosome physical map in the CHO DG44 cell line by BAC-FISH (Cao et al. 2012a). This physical map is based on the chromosomal locations of 303 BAC clones containing genomic fragments of CHO cells. We determined the BESs of BAC clones that were used for physical map construction in the CHO DG44 cell line. We attempted to determine all BESs of the 303 BAC clones (606 BESs) by end sequencing; we succeeded in this for 558 BESs. Consequently, both BESs of 266 BAC clones were determined (Table 1). The average length of the determined sequences was 532 bp and the minimum and maximum sequence lengths were 175 and 1100 bp, respectively. However, both BESs of 11 BAC clones could not be determined. Among these 11 BAC clones, 6 are located in chromosomes A and B, which are the paired longest chromosomes of the CHO DG44 cell line (Omasa et al. 2009; Cao et al. 2012a). We also analyzed whether BESs have high homology with the mouse genomic sequence. The results showed that 465 BESs exhibit high homology with the mouse genome (Fig. 1). It appeared that the lengths of the determined sequences did not affect the high homology. A previous study on comparative genomic hybridization (CGH) between Chinese hamster and mouse showed that the entirety of each chromosome was not conserved between the two species (Yang et al. 2000). Therefore, we compared the locations of BESs between the CHO DG44 cell line and mouse chromosome using BLAST (Fig. 2). Figure 2 shows the proportion of regions highly homologous (e-value of less than 10−2) to the mouse genome in each CHO DG44 chromosome (A–T). The results show that each chromosome in the CHO DG44 cell line exhibits high homology to various mouse chromosomes. The chromosome regions with high homology between the CHO DG44 cell line and mouse chromosomes were not the same in each chromosome. Yang et al. (2000) also reported similar results from a comparative chromosome map between Chinese hamster and mouse.

Table 1 Results of determined BAC end sequences
Fig. 1
figure 1

Size distribution of determined BAC end sequences. Black: number of sequences homologous to mouse chromosome sequences. Gray: number of sequences not homologous to mouse chromosome sequences

Fig. 2
figure 2

Distribution of homologous chromosome sequences between CHO DG44 cell line and mouse chromosome. Legends indicate highly homologous loci in mouse chromosome. CHO DG44 chromosomes are arranged in order from A to T based on chromosome length, as reported previously (Omasa et al. 2009; Cao et al. 2012a, b)

We have determined the loci of BESs in CHO DG44 cell line chromosomes by BAC-FISH (Cao et al. 2012a). Although the chromosomal structures of Chinese hamster and mouse are not the same, we investigated the relationship of BESs between positions in the CHO DG44 chromosome and sequences in mouse chromosomes (Fig. 3). The results showed that 23 specific regions in 13 chromosomes of the CHO DG44 cell line had similarities to specific mouse chromosomes (r2 ≥ 0.850; Fig. 3 and Supplementary Fig. 1). In particular, two regions in chromosomes A and B were correlated to mouse chromosomes 3 and 8 (Fig. 3a, b). Several BESs are not located in these specific regions; however, the results showed a tendency for the sequences of these two regions to be conserved in the mouse genome. Sixteen BESs showed high homology to mouse chromosome 14. Among these, 15 BESs are located in a specific narrow region of chromosomes A and B (FLpter value: 0.04–0.22) (Fig. 3c). However, the correlation coefficient between BESs and mouse genome was quite low (r2 = 0.208) in this result. Then, no significant correlation was identified in this narrow region. Consequently, this region of chromosomes A and B was not conserved compared with the mouse genomic sequence. These results suggest that some genomic modifications such as translocation have occurred in the homologous regions of mouse chromosome 14 and/or CHO chromosomes A and B.

Fig. 3
figure 3

The relationship between the chromosomes of the CHO DG44 cell line and mouse chromosome sequences based on BESs. Horizontal axis reflects the chromosomal position in each CHO DG44 cell chromosome (the mean of the FLpter value; Lichter et al. 1990), while vertical axis represents the position on the mouse chromosome [bp] in all graphs. In these graphs, S is the slope of the fitted curve and r2 is the correlation coefficient. Gray closed circle indicates the position excluded from the fitted curve in all graphs. ac These graphs indicate the regions with high correlation between chromosomes A and B in the CHO DG44 cell line and the mouse chromosome. a Relationship between chromosome A and B and mouse chromosome 3 (MMU3). b Relationship between chromosome A and B and mouse chromosome 8 (MMU8). c Relationship between chromosome A and B and mouse chromosome 14 (MMU14). dg These graphs indicate the regions with high correlation between chromosomes C and D in the CHO DG44 cell line and the mouse chromosome. d Relationship between chromosome C and D and mouse chromosome 1 (MMU1). e Relationship between chromosome C and D and mouse chromosome 4 (MMU4). f Relationship between chromosome C and D and mouse chromosome 10 (MMU10). g Relationship between chromosome C and D and mouse chromosome 15 (MMU15). Black closed circle indicates the position of chromosome C; white closed circle indicates the position of chromosome D. h and i These graphs indicate pervasive regional similarities between the CHO DG44 cell line and mouse chromosome 2. h Relationship between chromosome H and mouse chromosome 2 (MMU2). i Relationship between chromosome J and mouse chromosome 2 (MMU2). There are two correlative regions between chromosome H, J and MMU2: one of them is the narrow region (FLpter value: 0.43–0.48) and the other one is the wide region (FLpter value: 0.58–0.97). White closed circles indicate the narrow region and black closed circles indicate the wide region

As shown in Fig. 3d–g, homologous BESs were located symmetrically in chromosomes C and D. Even considering the error in determining BAC-FISH positions in chromosomes in the CHO DG44 cell line, specific regions in chromosomes C and D have high similarities to mouse chromosomes 1, 4, 10, and 15. We previously estimated that chromosome D is partly deleted chromosome C based on BAC-FISH results (Cao et al. 2012a). This BES analysis also confirmed that chromosome D is a mutated form of chromosome C, from which part of the long arm has been deleted.

Chromosomes H and J showed homology to mouse chromosome 2 in large parts of each of their long arms (Fig. 3h, i). This is unique in chromosomes H and J of the CHO DG44 cell line because other regions homologous to mouse are short. It was reported that large parts of Chinese hamster chromosome 6 and rat (Rattus norvegicus) chromosome 3 show high homology to mouse chromosome 2 (Yang et al. 2000). Consequently, these regions in chromosomes H and J might be conserved among rodent species. However, it is known that multiple chromosomal modifications are prone to occur in chromosomes of the CHO cell line. Moreover, large parts of chromosomes H and J in the CHO DG44 cell line show high homology in BAC-FISH and BES sequences to mouse chromosome 2. Thus, these regions might be conserved among several CHO cells, such as the CHO K1 cell line.

It was reported that chromosome X, a sex chromosome, is conserved to a large extent among Chinese hamster, mouse, and rat, as determined by CGH analysis (Yang et al. 2000). Most BESs that have homology to chromosome X in mouse are largely located on chromosomes E and P (Fig. 4), but no structural similarity among them was identified by BES analysis. This indicates that genetic composition revealed by CGH analysis is similar among rodent chromosome X, but its locations in the chromosome revealed by BESs are quite different.

Fig. 4
figure 4

The relationship between mouse chromosome X (MMU X) and CHO DG44 cell line chromosome: a chromosome E, b chromosome P. Horizontal and vertical axes and r2 are the same as in Fig. 3

These findings were not revealed by hybridization analysis, such as CGH and SKY, because, in hybridization analysis, the results do not reveal the exact relationship between sequences and chromosomal locations. To overcome this problem, BAC-FISH with BES analysis can be used, which involves the combination of BAC-based hybridization and sequence data; it is a useful technique to link sequences and chromosomal locations. Moreover, using BAC-FISH with BES analysis, it is possible to perform comparisons between chromosome locations in several cell lines. Recently, the genomic sequences of Chinese hamster and its cell lines were revealed by next-generation sequencing (NGS) (Xu et al. 2011; Vishwanathan et al. 2016), and these genomic scaffolds which were constructed by NGS results were compared with mouse genomic sequences and chromosome locations. However, their scaffolds could not be linked to chromosomal locations in Chinese hamster or Chinese hamster-derived cell lines. Also, these read sizes of genomic sequences from NGS analysis were too short to compare with not only our BES but also Chinese hamster genome exactly. As human and mouse genomic sequences have been linked to the position on their chromosomes using physical map, the short-read genomic sequences determined by NGS should be compared with the chromosomal location revealed by BAC-FISH and BES data. Recently, Lewis et al. (2013) analyzed the CHO genome sequence and compared the sequence scaffolds based on our previous BAC-FISH data (Lewis et al. 2013). Moreover, very recently, the 3rd generation PAC-BIO NGS analysis was reported (Rupp et al. 2018). If using these PAC-BIO data, it may be possible to compare Chinese hamster genome and CHO DG44 genome or genome sequence of other cell line, exactly. Genomic sequences are essential information for understanding cells and useful for genome engineering technologies, such as genomic editing. It is expected that BAC-FISH and BES data could contribute to the development of scientific research in CHO cells and the construction of cell lines for producing protein-based pharmaceuticals.