Introduction

Spermatogenesis is a very complex process encompassing critical phases such as meiosis, chromatin condensation, transcriptional and translational silencing. Spermatogenesis takes place in close association with Sertoli cells and hormones play a driving factor in successfully establishing the task. Despite extensive research being done to understand the molecular events occurring during such a tightly controlled event, the intricacies of these processes are far from clear. Many factors have impeded the study of spermatogenesis. It is challenging to study sperm formation in an isolated system (in absence of Sertoli cells). Moreover, sperm are specialized differentiated cells which cannot be cultured or passaged. Availability of- and challenges in culturing spermatogonia stem cells (SSCs) and differentiated cell type like spermatid, is another road block as unlike cancer cells, these do not proliferate in vitro. In adult testis of mice, SSCs account for only 0.4% of the total testicular cell population and isolating and maintaining them in undifferentiated state is by far a challenge given the heterogeneity of the testicular cell population. As spermatogenesis takes place in close associations with Sertoli cells, purity of a germ cell culture is always questioned. Moreover, apart from the tough task of maintaining them, these primary cells are not very amenable for transfection, inhibitor studies and certain functional assays. Additionally, reproducibility of assays using primary cells is an issue. All these challenges necessitated the development of germ cell lines. The first spermatogonial cell line, GC-1spg (GC-1) was developed by Hoffman et al., in 1992 [1]. Subsequently several cell lines representing different stages, such as GC-3, GC-4 representing preleptotene and early pachytene spermatocytes [2], A-303, A-304 and C-184 representing type A spermatogonia were established [3, 4]. Of all the germ cell lines that have been established, only GC-1 and GC-2 are commercially available. GC-1 cell line was established by isolating SSCs from 10-day-old mice pups and immortalizing them with large T antigen. These cells express lactate dehydrogenase C-4 (testis-specific isoform). They were named GC-1spg as they represent the stage between type B spermatogonia and preleptotene spermatocytes. GC-1spg are arrested at an early stage of meiosis [1].

The same group reported a second cell line GC-2(spd)ts which was derived from 6 week mouse testis and represents a stage between preleptotene spermatocyte and round spermatid [2]. Mouse primary germ cells that were enriched in preleptotene spermatocytes were isolated by unit-gravity sedimentation and immortalized, by co-transfecting them with the plasmid pSV3neo containing the Large T antigen gene (plasmid LTRp53cG9 containing the temperature mutation).

As both these cell lines are being extensively used, taking advantage of cDNA microarray technology, expression profiling of GC-1 and GC-2 cell lines was done to understand their precise nature. Our data yielded a myriad of genes some of which have not been reported previously in the testis and some in spite of being enriched, their function has not been explored yet. Ploidy analysis of GC-2 cells was also done as there are conflicting reports over presence of haploid peak.

Materials and Methods

Cell Culture

The GC-1(spg) [GC-1] and GC-2spd [GC-2] cell lines were obtained from ATCC (CRL-2053, CRL-1296; Manassas, VA, USA). The cells were cultured in Dulbecco’s Modified Eagle’s Medium (DMEM; Himedia, Mumbai, India), supplemented with 10% Foetal Bovine Serum (FBS; Invitrogen, Carlsbad, CA, USA), 100 U of penicillin/100 µg of streptomycin (Himedia, Mumbai, India) and maintained at 37 °C in a humidified atmosphere containing 5% CO2.

cDNA Synthesis and PCR

The testicular origin of GC-1 and GC-2 was confirmed by Reverse Transcriptase Polymerase Chain Reaction (RT-PCR) amplification for Ldhc4, Crem, Brdt, Prss50 and Gfra1α. Forward and Reverse primers used for amplification are as tabulated (Table 1). Towards this, RNA was extracted from these cells using TRIzol (Invitrogen, Carlsbad, CA, USA) as per manufacturer’s protocol and resuspended using autoclaved diethylpyrocarbonate (DEPC) treated water. For cDNA synthesis, Verso cDNA synthesis kit (Invitrogen, Carlsbad, CA, USA) was used. 1 μg of RNA was reverse transcribed using reverse transcriptase and oligo dT primers, at 42 °C for 1 h followed by denaturation at 94 °C for 5 min and then chilled on ice. To rule out genomic DNA contamination a ‘no reverse transcriptase’ (No RT) control was also included during the cDNA synthesis. RT-PCR amplification was performed at 94 °C for 5 min, followed by 35 cycles of 95 °C for 30 s, 60 °C for 25 s, 72 °C for 45 s, for all the genes. Separation of the PCR products was achieved by electrophoresis of PCR products on 2% agarose gel.

Table 1 Primers used for confirmation of testicular lineage of GC-1 and GC-2 cells

Flow Cytometry

Ploidy of GC-2 cells was determined by flow cytometry analysis. GC-2 cells were cultured as mentioned above, trypsinized to dislodge them, collected by centrifugation and the pellet was washed in 0.1 M phosphate buffer saline (PBS). Cells were stained in a FACS buffer containing 0.1% sodium citrate, 0.1% Nonidet P-40 (Himedia, Mumbai, India), 100 mg/ml RNase A (Thermo scientific, MA, USA) and 50 µg/ml propidium iodide and subjected to ploidy detection on FACS AriaSORP with Argon laser (Becton Dickinson; San Jose, CA, USA) and analysed using FACS Diva 6.1.3 software (BD, San Jose, CA, USA).

RNA Quality Control

RNA was extracted from GC-1 and GC-2 cells as described above. The concentration and purity of the RNA was evaluated using the Nanodrop Spectrophotometer (Thermo Scientific; 2000). Integrity of the RNA was determined in the Bioanalyzer (Agilent; 2100 Expert). RNA was considered to be of good quality if their 260/280 values were between 1.8 and 2 (Nanodrop) and an optimal RNA integrity profile was observed in Bioanalyzer. The microarray hybridization and scanning were performed at the Agilent certified microarray facility at Genotypic Technology, Bengaluru, India.

Microarray and Data Analysis

The samples for Gene expression were labelled using Agilent Quick-Amp labelling Kit (p/n 5190-0442). Total RNA was reverse transcribed at 40 °C using oligo dT primer tagged to a T7 polymerase promoter and converted to double stranded cDNA. Synthesized double stranded cDNA were used as template for cDNA generation. cDNA was generated by in vitro transcription and the dye Cy3 CTP (Agilent) was incorporated during this step. The cDNA synthesis and in vitro transcription steps were carried out at 40 °C. Labelled cDNA was cleaned up using Qiagen RNeasy columns (Cat No: 74106, Qiagen) and quality assessed for yields and specific activity using the Nanodrop ND-1000. Hybridization was carried out in Agilent’s Surehyb Chambers at 65 °C for 16 h. The hybridized slides were washed using Agilent Gene Expression wash buffers (Agilent Technologies, Part Number 5188-5327) and scanned using the Agilent Microarray Scanner (Agilent Technologies, Part Number G2600D).

Raw data extraction from Images was obtained using Agilent Feature Extraction software. Feature extracted raw data were analysed using Agilent GeneSpring GX (v14.5) software. Normalization of the data was done in GeneSpring GX.

For validation of microarray data, gene expression in GC-1 and GC-2 cells was confirmed by PCR. RNA from adult mice testis was used as positive control. ‘No reverse transcriptase’ reaction was kept as control to rule out any amplification from genomic DNA. A ‘no template’ control was included to rule out contamination from reagents used.

Bioinformatics Tools and Databases

The functional enrichment, gene ontology and pathway analysis, chromosome distribution, classification of transcripts was performed using PANTHER [5, 6], KEGG [7], REACTOME [8] and ShinyGO [9]. Genes list for GO terms were extracted from Mouse Genome Informatics (MGI)[10]. Venn diagrams were made using Venny 2.1.

Results

Confirmation of Testicular Lineage of GC-1 and GC-2

Testicular lineage of GC-1 and GC-2 was investigated by PCR for testis-specific genes such as Ldhc4, Crem, Gfra1α, Brdt and Prss50 in these cell lines. Except Ldhc4, bands for all other genes were seen in both the cell lines. Total testicular RNA used as positive control showed the bands for all the transcripts studied (Fig. 1a). Investigation of the ploidy of GC-2 cells showed only a diploid (2 N) and tetraploid (4 N) peak (Fig. 1b).

Fig. 1
figure 1

RT-PCR of Ldhc, Prss50, Brdt, Crem, and Gfra in GC-1 and GC-2 cells and Flow cytometry analysis for ploidy of GC-2 cells. PCR amplification products of respective genes on 2% agarose gel (a). L: 100 bp ladder, Lanes 1, 2 and 3 represent amplified PCR products of respective genes from GC-1 cells, GC-2 cells, and mouse testis, respectively. Mouse testis has been used as a positive control. NTC: ‘No template’ control, and NRT: ‘No reverse transcriptase’ control. Flow cytometry analysis was performed on linear scale (b). X-axis indicates relative size of the genome (DNA content) of the cells as indicated by fluorescence intensity; Y-axis indicates relative number of cells with a certain genomic content. No haploid peak is observed (indicated by arrow). The peaks are labelled as N (haploid DNA content), 2 N (diploid DNA content; G1 stage), and 4 N (tetraploid DNA content; G2 stage)

Gene Ontology and Functional Enrichment Analysis of Transcripts Exclusive to GC-1 and GC-2 Cells

Using a 60 K array, gene expression analysis was done for GC-1 and GC-2 cell lines to determine the transcript repertoire of the two cell lines. A total of 23,351 transcripts were detected in GC-1 and 20,992 in GC-2 cell line. 3152 transcripts were detected exclusively in GC-1 and 793 were exclusive to GC-2 cells (Fig. 2a). The transcripts for these were distributed more or less on all chromosomes in both the cell lines. Maximum number of transcripts were encoded from chromosome 7, followed by chromosome 11 and only 3 from the Y chromosome in GC-1. In GC-2 cells, maximum number of transcripts were from chromosome 3, and no genes from the Y chromosome (Fig. 2b). In both the cell lines, protein coding genes had maximum representation with 1435 transcripts in GC-1 and 341 transcripts in GC-2 cells. In GC-1, 200 were linc RNA, and no information was available for 1134 transcripts. In GC-2, 42 were RIKEN cDNA, 19 linc RNA and no information was available for 391 genes (Fig. 2c). The protein coding genes were involved in many pathways with maximum number of genes representing Wnt signalling pathway- 25 genes in GC-1 and 13 genes in GC-2 (Fig. 2d).

Fig. 2
figure 2

Chromosomal distribution, gene categories and Enriched pathway of transcriptome genes exclusive to GC-1 and GC-2 cells. Venn diagram representing genes expressed exclusively in GC-1 and GC-2 cells and those common to both (a). Based on information generated using Shiny tool, chromosomal distribution of transcripts of GC-1 and GC-2 exclusive genes is depicted, sex chromosomes have been represented as red bars (b). Categories of transcripts expressed in GC-1 and GC-2 cells; maximum number of transcripts are protein coding genes in both the cell lines (c). Genes involved in major pathways have been represented based on PANTHER (d)

Protein classification by PANTHER and KEGG revealed that in GC-1 cells, 159 (25%) transcripts represented enzymes, 77 (12%) transmembrane signal receptors and 69 (11%) represented gene-specific transcriptional regulators (Fig. 3a). Of the 69 gene-specific transcription factors, 66 were DNA binding transcription factors comprising largely of Zinc finger transcription factors (23) followed by basic loop helix (11); 3 were cofactors, i.e. CBFA2T3, ATF7IP2, and VGLL2. CBFA2T3 has been identified as one of the candidate gene in Congenital bilateral absence of vas deferens (CBAVD) patients [11]. From the 35 cytoskeleton proteins, the majority of them were actin binding proteins, followed by microtubule motor proteins and intermediate filaments (Fig. 3a). Eight were non-motor actin binding proteins.

Fig. 3
figure 3figure 3

Protein classification of genes expressed exclusively in GC-1 and GC-2 cells. Protein classification of genes expressed based on PANTHER has been represented for GC-1 (a) and for GC-2 (b)

There were 34 genes belonging to transporter group of proteins (Fig. 3a). Amongst Ion channels there were 7 potassium voltage gated ion channels and 7 ATP-binding cassettes (ABC) transporters as identified in PANTHER. Cacna1G (Calcium Voltage-Gated Channel Subunit Alpha1 G), Trpc 3 and 4, Catsper 2 and 3 were some of the ion channel protein transcripts noted exclusively in GC-1 cells. Cacna1G encoded on chromosome 17 has been reported to be present in pachytene spermatocytes [12]. Canonical Transient receptor potential channels (TRPC) are cationic channels and mediate influx of Ca2+, Na+, and K+ in voltage-dependent manner. TRPC3 and TRPC4 belong to transient receptor potential family of ion channels. TRPC3 is present on different regions of sperm. It is localized to the mid-piece of flagella in mouse sperm [13]. In human sperm presence of TRPC3 has been reported on mid-piece and acrosomal region. TRPC4 has been observed to localize on head (weak expression) with a strong expression in mid-piece and tail region of the flagella [14]. Different localization of TRPC isoforms indicates that they intercede complex signalling in spermatozoa. Other important transcripts detected in GC-1 cells were that of the cation channel sperm-associated proteins 2 and 3 (CATSPER2 & CATSPER3), which are specifically expressed in testis [15]. CATSPER2 and CATSPER3 along with other members of the family have a well-documented role in fertility both in human and mice. The phenotypes of Catsper2/, Catsper3/ and Catsper4/ mice were similar to that of Catsper1/ mice, and their sperm lacked hyperactivated motility required for fertilization [16]. Whole cell patch clamp experiments on caudal sperm of Catsper1/ mice showed absence of Ca2+ current in the flagella [17].

Transcripts exclusive to GC-2 cells also comprised majorly of enzymes (24%; 45 transcripts) followed by gene-specific transcriptional regulators (12%; 22 transcripts) of which 20 were DNA binding TFs (11 zinc finger family of transcription factors, 6 basic helix turn helix and 3 others) and 2 were transcription cofactors (Amy1 & Maml3; Fig. 3b). One of the transcription factors represented in the list was PRDM16. PRDM16 has been shown to have a role in embryo development and Cryptorchidism [18, 19].

There were 14 transcripts representing cytoskeleton proteins including Dnah9, Spire1, Dnali1 (Fig. 3b). DNAH9 mutations have recently been reported in patients with primary ciliary dyskinesia (PCD), one of whom had very poor sperm motility [20, 21]. SPIRE1 is an actin nucleation protein, reported to be heavily expressed in testis near ectoplasmic specializations. In vitro and in vivo RNAi mediated silencing of Spire 1 leads to perturbation of actin filaments in the Sertoli cell–cell interface and disrupt the organization of microtubules in the testis thereby disrupting spermatogenesis [22]. Another cytoskeletal protein DNALI1 is known to be expressed extensively in murine sperm and is also detected in spermatocytes [23].

GO analysis and Functional enrichment analysis for biological process, cellular component and molecular function was done using PANTHER (Supplementary Fig. 1a–c). In the GC-1 cells, in the Biological process category maximum number of genes were identified to be part of either cellular process (GO: 0009987; 579 genes), biological regulation (GO: 0065007; 386 genes), and/or developmental process (GO: 0032502; 112 genes). In the reproductive process (GO: 0022414), there were 17 genes, amongst which 3 genes namely, Testis expressed gene 14 (Tex14), Rimbp3 and Meiosis-specific protein (Mei4) are known to be expressed in testis. Mutations in TEX14 are associated with infertility in male [24]. As documented in protein atlas (https://www.proteinatlas.org/ENSG00000275793-RIMBP3/tissue), RIM-BP3 is highly expressed and enriched in testis. It is associated with the manchette in elongating and elongated spermatids and has an indispensable role in sperm morphology [25]. MEI4, which is specifically expressed in meiotically dividing cells is highly conserved from yeast to mice. Homozygous Mei4−/− mice are sterile with no post meiotic cells in the seminiferous tubules [26].

In GC-2 cells, GO analysis for biological process showed maximum number of transcripts to be part of cellular process (GO: 0009987; 186 genes), and biological regulation (GO: 0065007; 122 genes). There were 44 genes identified to be involved in developmental process (GO: 0032502), and 7 genes in reproductive process (GO: 0022414). The cellular components and molecular functions of the genes identified in GC-1 and GC-2 cells are also shown (Supplementary Fig. 1a–c).

We retrieved gene list from MGI database using terms like “spermatogenesis” (GO: 0007283), which revealed 578 genes having a role in spermatogenesis, of which 32 and 11 were found in GC-1 and GC-2, respectively. Of these 6 and 2 were involved in cell cycle (GO: 0022402) in GC-1 and GC-2, respectively (Fig. 4c). For “sexual reproduction” (GO: 0019953) out of 920 genes, 55 and 21 were seen in GC-1 and GC-2, respectively. From the 1580 genes representing “reproductive process” (GO: 0022414), 108 and 31 genes were present in GC-1 and GC-2, respectively. List of genes in each category is provided in Supplementary File 1.

Fig. 4
figure 4

Heatmap of gene expression in GC-1 and GC-2 cells and representation of their involvement in various processes. Heatmap displaying expression levels of genes in GC-2 cells with respect to GC-1 cells (a). Representation of genes retrieved from MGI database seen in both GC-1 and GC-2 cells using GO terms specific for genes involved in reproductive process and sexual reproduction. Blue circles represent the percentage of genes in the respective category from MGI database. Yellow circles represent the percentage of genes common between GC-1 and GC-2 cells. Overlapping areas represent genes common in MGI database, GC-1, and GC-2 cells (b). Representation of genes retrieved from MGI database seen in GC-1 and GC-2 transcriptome using GO terms specific for genes involved in spermatogenesis and cell cycle (c)

Gene Overlap between GC-1 and GC-2 Cells

Microarray based gene expression analysis revealed 20,199 transcripts which were common to both the cell lines. From the transcripts common to both the cells, 4900 transcripts were differentially expressed between GC-1 and GC-2 cells (Fig. 4a). Of these, 2714 transcripts were upregulated and 2186 were down regulated in GC-2 compared to GC-1 cells (fold change ≥ log21; p value ≤ 0.05).

The gene list from MGI using the term ‘spermatogenesis’(GO: 0007283), revealed 578 genes having a role in spermatogenesis. On comparing our data of genes common to both GC-1 and GC-2 cells with these data, transcripts of 330 spermatogenesis related genes were identified in GC-1 and GC-2 cells. Of these 330 transcripts, 62 were involved in “cell cycle” (GO: 0022402). For ‘sexual reproduction’ (GO: 0019953) out of 920 genes, 554 were noted in both the cell types and out of the 1580 genes reported for ‘reproductive process’(GO: 0022414), 953 were observed in both GC-1 and GC-2, respectively (Fig. 4b, c). List of genes in each category is provided in Supplementary File 1.

Validation of Genes by PCR

For validation of microarray data, RT-PCR was done for a few genes listed in both the cell lines (Klf4, Fgf3, Cabyr, Inhbb, Pouf2, Coro1a, Coro1b, Sgsh, Map3k4, Phf7, Wt1, Rnf 168, Pdlim1, Meiob, Gdnf, Syngr4, Xlr), and for some genes exclusive to GC-1 (Pak7, Trpc3, Zp3, Crisp2, Mei4, Tex14, Fgf2, Dmrt1, Rimbp3) and GC-2 (Mesp2, Prdm16, Adam23, Trim5, Mef2c, Dnah9, Klf15, Trim12, Lhx9). PCR was done using cDNA from both the cell lines. The primers used and the PCR conditions are shown in Supplementary File 2.

Faint bands were seen for Trpc3 and fgf2 and good expression was seen for the remaining genes in GC-1 cells. No expression was detected for any of the gene in GC-2 cells (Fig. 5a). The exclusivity of all the genes tested from the GC-2 exclusive list were also confirmed. Trim12 and Mef2c were expressed in good amount as compared to other genes (Fig. 5b). 18 genes identified to be present in both the cell lines were also validated by RT-PCR. Coro1a and Coro1b were present in both the cell lines. Two Sertoli cell-specific genes Wt1 and Inhbb were present in good abundance in both the cell lines (Fig. 5c, d). ‘No reverse transcriptase’ control ruled out genomic DNA contamination.

Fig. 5
figure 5

RT-PCR of genes expressed exclusively in GC-1 or GC-2 cells and genes expressed in both the cell lines. RT-PCR of genes exclusive in GC-1 (a), and GC-2 (b). Lanes labelled as ‘1’ and ‘2’ represent amplification products of respective genes from GC-1 and GC-2 cells, respectively. PCR amplification for genes identified in both the cell lines GC-1 (c) and GC-2 (d). L- 100 bp Ladder; NRT: No reverse transcriptase; NTC: No template control

Discussion

The molecular mechanisms involved in spermatogenesis are associated with features that are unique to cell types expressed at different stages of spermatogenesis. In vitro germ cell lines are therefore a good avenue to study some of these mechanisms. Of all the germ cell lines that have been reported thus far GC-1 has been widely used and highly cited in literature followed by GC-2. We revisited the uniqueness of these two cell lines. The presence of Crem observed by us in GC-2 cells has not been previously reported in these cells. However, no signal was noted for Ldhc4 in both the cell lines. Hoffman et al. [1], reported the presence of LDHC4 by Immunocytochemistry but were unable to detect the transcript by RT-PCR in GC-1 cells. By qRT-PCR, Godmann et al., reported Ct values around 34 for Ldhc4 indicating negligible Ldhc4 expression [27]. Brdt and Prss50 observed by us in both the cell lines, have been reported to be exclusively expressed in testis [28].

GC-1 cells represent a stage between type B spermatogonia and primary spermatocyte. Type B spermatogonia become primary spermatocytes which undergo meiosis and express proteins required for meiotic division. GC-1 being type B spermatogonial cells which undergo mitosis, the presence in this cell line of transcripts for MEI4 which is known to be specifically expressed in meiotically dividing cells, coupled with reports indicating that Mei1 deficient spermatocytes are arrested at the zygotene stage of meiosis 1 [29] indicates the importance of this gene in the formation of a spermatid. This implies that the type B spermatogonia is getting ready to enter into meiosis as a primary spermatocyte. GC-2 cell line supposedly represents a stage between primary spermatocyte and round spermatid. Originally, this cell line was claimed to be representative of round spermatid on the basis of expression marker SP10. Later on, Wolkowicz et al., [30] from the same group reported that the cell line is not haploid based on absence at transcript level of various spermiogenesis protein markers such as Sp-10, protamine, acrosin, and LDHC4. We did not find a haploid peak thus corroborating the observations of Wolkowicz. Presence of transcript of premeiotic genes Osr2, Pcdh18 in both the cell lines suggested that both are premeiotic cell lines. Expression of testis-specific genes Dmrtb1, Prss50, Crisp2 in these cells affirm their germ cell origin. These genes have not been reported earlier in these cell lines.

A search of the literature coupled with observations from this study, also identified many genes such as Coro1a, Coro1b, Zfhx4 and the multi-copy X-linked gene Xlr (X-linked lymphocyte regulated) not previously reported in testis, or implicated in any biological functions, possibly due to incomplete functional annotations both in GC-1 and GC-2 cells. The presence of Coro1a, Coro1b and Xlr was confirmed by RT-PCR (Fig. 5c). For researchers wishing to explore the functions of these genes, these cell lines are a good model system. The broad coverage of gene expression analysis and data mining provided by us can be used as material and background reference for many pioneering studies exploring the novel genes.

Combing information from literature, we came across genes such as Rnf168 and Dmrtb1, which was observed by us to be present in both the cell lines. These have been reported to be conserved across species and important for spermatogenesis wherein they are crucial for the transition from mitosis to meiosis [31]. No phenotypic information for these is available in the major databases such as neXtProt and MGI. As genes such as these are not available in the databases commonly used by search engines, they do not appear in the search list of the top biological processes/ functions.

Our microarray data also identified the presence of genes Tcf21, Arx, Vim, Col1A1, and Sca1 which have recently been reported to be present in the newly identified testicular cells of mesenchymal origin [32]. Genes Id2, Gata3, Cd90, Il7r, Il13, and Rora which were highly expressed in innate lymphoid type II immune cells (ILCII), were also expressed in both GC-1 and GC-2 cells. Drop based RNA-seq studies have revealed that transcriptomes of mitotic spermatogonia and early preleptotene spermatocytes were very similar [33]. We also observed 80% similarity in the transcriptomes of GC-1 and GC-2 cells (Fig. 2a). Around 18–20% of the spermatogenesis-associated transcripts in both the cell lines were involved in the “cell cycle” (Fig. 4c). We compared our data with the scRNA-seq data on primary spermatocytes described by Ernst et al. [34]. 10,694 transcripts were identified by Ernst et al., in primary spermatocyte. Around 37% (8776) of GC-1, 40.4% (8481) of GC-2 transcripts and 41.6% (8392) transcripts common in GC-1 and GC-2 reflected the phenotype of primary spermatocyte (Fig. 6). The 80% transcript overlap between GC-1 and GC-2 cells, their 40% resemblance to that of the primary spermatocyte and the increase and decrease in abundance of 10.8% and 13.4% of the transcripts (common in both the cells) in GC-2 as compared to GC-1 cells probably reflects the process of gradual cell differentiation suggesting that the transcriptome of these cell lines represent the process of differentiation from type B spermatogonia to preleptotene spermatocytes. Presence of chromatin remodelling and epigenetic modifiers Setx, Dnmt1, Asxl2, Phf1/2, Mllt10 (Dot1l), and Brd8/9 in both the cell lines indicates that their protein expression and chromatin modifications start prior to cells entering meiotic prophase as also suggested by Ernst et al. [34]. Also observed in both GC-1 and GC-2 cell lines was transcripts for Dmrt2c, that is known to be functional during meiosis and Odf2, Cabyr, Hook1, Tcp11, Phf7 that are known to be functional post-meiosis [35]. These studies and our observations from this study challenge a pre-existing notion about stage-specific expression and cell-type-specific markers in testis, bringing a paradigm shift in field of testis biology.

Fig. 6
figure 6

Similarity between transcripts identified in GC-1 and GC-2 cells with the transcriptome of primary spermatocytes Venn diagram depicts number of transcripts common between GC-1, GC-2 cells and the transcriptome of spermatocytes as reported by Ernst et al., 2019 [34]

Transcripts for many genes such as Gdnf, Gata4, Wt1, Inha, Inhbb, Cadm1, Cdh2, Fam189a2, Fam60a, Fbxo21, Fgf23, Foxq1, Frmpd1, Fv1, Gas7, Gatm, Gm867, Gstm6, Kk2, Hmgb3, Hopx, Hspb1, and Cadm1 which are specifically expressed in Sertoli cells [36,37,38] were present in both cell lines contributing to 1–2% of their respective transcriptomes (Supplementary Fig. 2). The presence of these transcripts may likely be a reflection of the plausible Sertoli cell contamination of GC-1 and GC-2 cells which were first reported and deposited in repository in early nineties [1, 2]. Our data also contain many RIKEN entries which are identified by RIKEN Mouse Gene Encyclopedia Project as a novel full-length mouse cDNA in the testis of an adult mouse [39, 40]. Gradually researchers are trying to understand and explore the role of RIKEN cDNA, for example 1700121C10Rik is a RIKEN cDNA, expressed exclusively in testis and encode 2 long coding RNA [41]. This RIKEN cDNA is present in both GC-1 and GC-2 cell lines.

In vivo, spermatogenesis takes place in close proximity of Sertoli cells and the extracellular matrix sharing hormones and signalling factors with each other and its immediate microenvironment. This cannot be recreated in vitro using testicular cell lines. Lack of cell–cell communications in vitro would influence the proteome and to a certain extent the oestrogen receptor mediated transcription in the GC-1 and GC-2 cells. Hence these cell lines cannot be used to study spermatogenesis in vitro. Never the less, our microarray data on GC-1 and GC-2 cell lines provide a rigorous analysis of the only commercially available germ cell lines. A couple of recent RNA-seq studies have provided a valuable resource of the single cell transcriptome of spermatogenic cell types [33, 42, 43]. However, for many genes, their functions need to be explored. The annotated information on GC-1 and GC-2 cell lines provided by us can be very useful for studying gene functions, molecular mechanisms and the pathways involved, as it is easy to alter the expression of gene by silencing/overexpression or using inhibitors in vitro as compared to in vivo. This can help decipher male infertility related gene mutations and to develop strategies to correct these mutations. Whilst the GC-1 and GC-2 cell lines are closest in testicular origin as compared to other such cell types, our data reveal an 80% overlap in their transcriptome with both the cell lines displaying about 40% primary spermatocyte characteristics and representing the premeiotic stages of differentiation from type B spermatogonia to preleptotene spermatocytes. The datasets generated by us will be highly resourceful, and can serve as a reference for future research in this area.