Keywords

Introduction

During the past two decades, much effort has been made to enhance anti-tumor immunological responses in patients by tumor antigen-specific immunization, which targets cancer cells [1, 2]. Among the different types of tumor antigens, cancer/testis (CT) antigens are promising therapeutic targets for the development of cancer vaccines due to their unique characteristics. Under normal physiological conditions, CT antigens are predominantly expressed in germline cells, but also in spermatocytes, and even in elongating/elongated spermatids. Since the testis is an immune privileged organ and also an immune privileged site, indicating that foreign tissue grafts into the testis can survive longer, or indefinitely, without being rejected; or the testis also resists rejection when its tissue is grafted into a non-privileged site [3, 4]. CT antigens residing in germ cells do not provoke immune response, and thus avoiding their eradication by the immune system. Interestingly, CT antigens can be aberrantly expressed in multiple cancers, which is suggestive of an aberrant reactivation of the silenced gametogenic program in somatic cells during tumorigenesis [5, 6]. Thus, CT antigens hold great promise for highly specific and safe immunotherapy. The first CT antigen MAGE1 (melanoma-associated antigen 1) was discovered more than two decades ago through autologous typing [7]. Since then, more than 200 CT antigens have been identified [8]. Some of these CT antigens have been tested as tumor vaccines in clinical trials. Unfortunately, most clinical studies involving CT antigens have so far been unsuccessful [9,10,11], partly due to the fact that there is a lack of systematic understanding of the expression pattern of CT antigens across different cancer types, as well as the molecular mechanisms underlying their aberrant expression and oncogenic functions in cancer cells.

Previous studies have reported shared characteristics between germ cells and cancer cells (for a review, see [6]). For instance, CT antigens that are involved in germ cell implantation and migration during gametogenesis can regulate cancer cell invasion and metastasis [6, 12]. Therefore, it is likely that similar mechanisms underlie the regulation of CT antigens during gametogenesis and their ectopic expression during tumorigenesis. Thus, understanding the regulation of CT antigens during gametogenesis may provide valuable insights into their regulation in the context of cancers. Recently, one research group has carried out a comprehensive analysis using publicly available databases to systematically identify CT antigens, which has significantly expanded the repertoire of CT antigens [13]. Furthermore, the past few years have witnessed a significant surge in publicly available next generation sequencing datasets both from the fields of reproduction and cancer. In this review, we analyzed these datasets and focused our attention on the regulation of CT antigens in the contexts of gametogenesis and tumorigenesis. We found that many aberrantly expressed CT antigens in cancer cells associated with super-enhancers, and some of these associations were conserved in normal testes and cancers. We also reported that methyltransferase DNMT1 is a potential regulator of the expression of CT antigens, and identified CT antigen and transcription factor TAF7L as an important regulator of the expression of a subset of CT antigens. This information may be useful in the identification of new targets among CT antigens for the treatment of cancer. This review also provides a credible approach of using big datasets deposited in public domain for meaningful analysis by investigators in the field.

Approaches Used for Analysis

  1. 1.

    Data sources. Raw datasets from published studies were retrieved from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). A complete list of the GEO accession numbers used in this report is provided in Table 1. Gene expression and DNA methylation data (Illumina 450k platform) from the TCGA Database were retrieved via the UCSC Xena Browser (http://xena.ucsc.edu/). Cancer samples lacking expression of target CT antigens were removed from the analysis. Gene expression data from normal human tissues were obtained from the GTEx Database via the UCSC Genome Browser (https://genome.ucsc.edu/). Promoter information was obtained from the Eukaryotic Promoter Database (http://epd.vital-it.ch/index.php). CpG island information was acquired from the UCSC Genome Browser. The MiPanda Michigan Portal for the Analysis of NGS Data (http://www.mipanda.org/) and the Firehose Browser (https://gdac.broadinstitute.org/) were also used to search and validate the data during the preparation of this report.

  2. 2.

    RNA-Seq analysis. RNA-Seq analysis, bulk RNA-Seq or single cell (sc) RNA-Seq, was performed based on the ‘new tuxedo’ pipeline [14]. In brief, raw reads were aligned to the mouse reference genome mm10 or human genome hg38 using the HISAT2 Program (version 2.1.0) [15] with default parameters. BAM files were then sorted and converted into SAM files using SAM Tools (version 1.4) (http://www.htslib.org/). The expressed genes and transcripts were assembled and quantified using the StringTie Program (version 1.3.3) [16]. Differential gene expression analysis was performed using the R/Bioconductor package Ballgown (version 2.10.0) [17]. Genes with FPKM = 0 (no fragments per kilobase of transcript per million mapped reads) were removed from the analysis. The multiple testing errors were corrected using the false discovery rate (FDR). Genes with a ≥2-fold difference in expression and a FDR <5% were considered differentially expressed. For studies in which processed data were available, the results were cross-referenced with those from in-house analysis, and the analyzed data from the original studies were only used if the results matched.

  3. 3.

    ChIP-Seq analysis. To build version mm10 of the mouse genome or version hg38 of the human genome with default parameters, ChIP-Seq datasets were aligned using the Bowtie 2 Program (version 2.3.2) [18] BAM files were sorted and converted into SAM files using SAM Tools. Peak calling was performed in MACS2 (version 2.1.1) [19] to identify regions of ChIP-Seq enrichment over the background. A p-value threshold of enrichment of 0.05 was implemented. ChIP peak annotation was performed using the R/Bioconductor package ChIPseeker (version 1.14.1) [20]. ChIP peak coverage was visualized within the Integrative Genomics Viewer Desktop Application (version 2.4) [21, 22]. For studies in which processed data were available, the results were cross-referenced with those from in-house analysis, and the analyzed data from the original studies were only used if the results matched. SEs (Super-Enhancers) were identified using the ROSE algorithm (version 0.1) with default parameters [23, 24]. The annotations of SE-associated genes based on the proximity to SEs in normal mouse testes were kindly provided by Dr. Alena Shkumatava from Institut Curie, PSL Research University.

  4. 4.

    Functional annotations and statistical analysis. Gene ontology analysis was performed using DAVID Online Tools (http://david.abcc.ncifcrf.gov/). The top-enriched clusters were identified by automated functional annotation clustering with default parameters.

    Statistical analysis between two groups was performed using a non-parametric t-test. In all cases, p < 0.05 was considered significant. Spearman’s rank correlation coefficient (ρ) was calculated as R (version 3.4.3) (https://www.r-project.org/).

Table 1 List of GEO accession numbers used in this report

A Subset of CT Antigens Belongs to the Core Fitness Gene Family

We obtained the complete list of CT antigens systematically identified by a recent study in which 1019 testis-specific transcripts expressed in at least 1% of the samples from any of the 19 cancer types were examined [13]. We first asked if any of these CT antigens are human core fitness genes. The 1580 core fitness genes were defined as genes whose disruption decreased cell growth and proliferation in three or more of the five human cell lines (one normal immortalized cell line and four cancer cell lines) tested as revealed by CRISPR screens [25]. We compared the list of CT antigens with the list of human core fitness genes and identified 28 CT antigens belonging to the core fitness gene family (2.7% of the CT antigens and 1.8% of the core fitness genes) (Fig. 1a; Table 2). GO term enrichment analysis suggested that these 28 CT antigens were enriched in important cellular processes critical for cell survival such as cell division, cell cycle control and DNA replication (Fig. 1b). Therefore, these CT antigens may serve as potential targets for the treatment of cancer.

Fig. 1
figure 1

A subset of CT antigens belongs to the human core fitness gene family. (a) Venn diagram shows that 28 CT antigens belonged to the core fitness gene family whose perturbation may have decreased cell growth and proliferation. (b) These 28 CT antigens were enriched in cellular processes such as cell division, cell cycle control and DNA replication

Table 2 CT antigens that belong to the family of core fitness genes

Multiple CT Antigens Are Under the Control of Super-Enhancers (SEs)

To gain insight into the regulation of CT antigens, we examined their association with SEs. SEs are large clusters of enhancers that are occupied by master transcription factors. They associate with key genes and are critical for cell identity [23]. Furthermore, SEs are hyperactive chromatin regions characterized by high levels of histone H3 lysine 27 acetylation (H3K27ac) that distinguish them from shorter, less active typical enhancer regions [23, 26]. SEs have been found at key oncogenic drivers in multiple cancer cells [24]. It has also been reported that cancer cells can acquire SEs at oncogenic drivers during the process of tumorigenesis as revealed by a comparison of SEs in cancer cells with corresponding normal cells [27]. Thus, SEs may be helpful for identifying key oncogenes in specific cancers. We accessed the datasets for the lists of genes that associated with SEs in a multiple myeloma (MM) cell line, a glioblastoma multiforme (GBM) cell line and a small cell lung cancer (SCLC) cell line [24, 27] and looked for CT antigens in each gene list (Fig. 2). We identified 17, 15 and 19 CT antigens in MM, GBM and SCLC, respectively (Fig. 2a). Each cancer cell type contained a set of CT antigens that was largely different from the other types with only a few overlaps (four between MM and SCLC, two between MM and GBM, two between GBM and SCLC) (Fig. 2a). These results were consistent with earlier observations that SEs are usually cell-type-specific and play key roles in the control of cell identity [23]. Nonetheless, two SE-associated CT antigens, namely, TPX2 and KIF2C, were aberrantly expressed in the three cancer cell lines. The SEs associating with these two CT antigens were found at similar chromatin positions across the three cancer cell types, although the lengths of the SEs varied (Fig. 2b, d). TPX2 is a spindle-associated microtubule (MT)-binding protein [28]. It is required for MT assembly to facilitate the formation of the spindle during mitosis in Hela cells [29] and meiosis in Xenopus eggs [30, 31]. To understand the role of TPX2 during gametogenesis, we accessed and analyzed RNA sequencing (RNA-Seq) datasets from human primordial germ cells (PGCs) [32], mouse PGCs [33, 34] and adult mouse germ cells at different developmental stages [35]. High expression of TPX2 was found throughout PGC development in both the human and the mouse, which is characterized by active cell division (Fig. 3a–c). High expression of TPX2 was also observed at the transition of adult mouse spermatocytes to round spermatids during which meiosis I/II takes place (Fig. 3a, d). This expression pattern of TPX2 during gametogenesis coincides with its reported functions. Therefore, it is plausible that the acquisition of SEs drives the aberrant expression of TPX2 in cancer cells, which facilitates cell division via mitosis. KIF2C belongs to the kinesin-13 family, which induces MT depolymerization to regulate chromosome movement and segregation during mitosis [36, 37]. It has been reported that KIF2C is overexpressed in colorectal carcinoma as well as several types of pancreatic, gastric, breast and head and neck cancers [38]. KIF2C is also overexpressed in gliomas, and it has been proposed to be a potential prognostic marker for glioma [39]. Thus, the acquisition of SEs may underlie the ectopic expression of KIF2C in different cancer types. Of interest, both TPX2 and KIF2C belong to the human core fitness gene family (Table 2). Taken collectively, our observations suggest that both TPX2 and KIF2C are potential targets for the treatment of cancer.

Fig. 2
figure 2

A subset of CT antigens is under the control of super-enhancers (SEs). (a) Venn diagram shows SE-associated CT antigens in multiple myeloma (MM), glioblastoma (GBM) and small cell lung cancer (SCLC) cell lines, respectively. TPX2 and KIF2C are the two common CT antigens found in the three cancers. (b) CT antigen KIF2C associates with a SE (red rectangle) in MM, GBM and SCLC, respectively, as indicated by the chromatin immunoprecipitation (ChIP) peak coverage of Med1. (c) Pie chart shows that SE-associated CT antigens account for 3.1% of all SE-associated genes in normal mouse testes. (d) CT antigen TPX2 associates with a SE (red rectangle) in MM, GBM, SCLC and normal mouse testes, respectively, as indicated by ChIP peak coverage of Med1 and H3K27Ac. (e) CT antigen SYNGR4 associates with a SE (red rectangle) in both SCLC and normal mouse testes as indicated by ChIP peak coverage of Med1 and H3K27Ac. (f) Bar graphs show that the hypomethylated region (HMR) coverage of SE-associated CT antigens HTF7, DNAJC5B and ACTL7B, respectively, increases in breast cancer compared with normal tissues

Fig. 3
figure 3

CT antigen TPX2 is expressed throughout male germ cell development. (a) Schematic drawing shows the entire process of male germ cell development from early primordial germ cells (PGCs) to mature spermatozoa both in the mouse and human. Key biological events are highlighted in red. (b) Bar graph shows the expression of CT antigen TPX2 in human PGCs. FPKM, fragments per kilobase of transcript per million mapped reads. (c) Bar graph shows the expression of CT antigen TPX2 in mouse PGCs. (d) Bar graph shows the expression of CT antigen TPX2 in mouse developing male germ cells in testes

To further understand the association between SEs and CT antigens, we examined SE-associated CT antigens in normal testes. Due to a lack of ChIP-Seq (chromatin immunoprecipitation followed by high-throughput sequencing) data on H3K27ac in human testes, we obtained and analyzed the H3K27ac ChIP-Seq dataset from mouse adult testes [26]. Among all SE-associated genes, 164 CT antigens were identified, which accounted for 16.1% of the total number of CT antigens and 3.1% of all SE-associated genes in adult mouse testes (Fig. 2c). We then compared the list of SE-associated CT antigens in normal mouse testes with the list of SE-associated CT antigens identified in MM, GBM and SCLC cell lines (Fig. 2a). Ten SE-associated CT antigens were found in both lists. For instance, SYNGR4 belongs to the synaptogyrin family and functions in pulmonary carcinogenesis and tumor progression [40, 41]. SYNGR4 associated with a SE in both normal mouse testes and a SCLC cell line at similar locations (Fig. 2e). We also found that TPX2 (but not KIF2C) associated with a SE in normal mouse testes, and this TPX2-associated SE shared a similar chromatin location with that in MM, GBM and SCLC cell lines (Fig. 2d). Taken collectively, our findings show that the associations with SEs in a certain subset of CT antigens are conserved from normal testes to various cancers, suggesting that a conserved mechanism underlies the regulation of these CT antigens between gametogenesis and tumorigenesis.

It was noted that only a small fraction of the associations (22.2% in our case) between the CT antigens and SEs were conserved from normal testes to cancers. This may have been due to the fact that mouse testes instead of human testes were used for the analysis. However, it may also have been due to the fact that there are other mechanisms responsible for the acquisition of SEs for CT antigens during tumorigenesis. The association of DNA methylation with enhancer elements has been proposed to influence enhancer activity [42, 43]. Thus, we speculated that an alteration in DNA methylation in SEs might contribute to the ectopic expression of certain CT antigens during tumorigenesis. To test this hypothesis, we obtained data from a recent study in which genome-scale DNA methylation analysis of normal tissues and multiple tumors were performed and the DNA methylation landscape across SEs among different normal tissue-matched tumors was reported [44]. We focused on breast cancers and found that three CT antigens (HTR7, DNAJC5B, ACTL7B) associating with SEs exhibited marked hypomethylation compared with normal tissues in which the occupancy of the hypomethylated regions (HMRs) increased dramatically in SE regions in the breast cancer samples (Fig. 2f). These results suggest that aberrant DNA methylation in SEs may be another contributing factor to the ectopic expression of CT antigens in cancers.

DNA Methyltransferase DNMT1 Is Involved in the Regulation of CT Antigen Expression

The mission to identify the link between DNA methylation in SEs and CT antigen expression in different cancer types prompted us to further investigate the relationship between DNA methylation and CT antigen expression. The status of DNA methylation in selective genomic regions can be instructive for gene expression [43, 45]. In normal testes, aging-associated alterations in DNA methylation during spermatogenesis have been reported to have broad effects on gene expression and cell signaling in offspring [46]. During embryonic development, early PGCs undergo epigenetic reprogramming where their genomes display very low levels of global DNA methylation [47]. However, certain genomic regions, including CpG islands (regions where CpG is present at significantly higher levels than is typical for the genome as a whole) on the X chromosome and germline-specific genes, are protected from the global loss of methylation, and they can only become demethylated upon gonad colonization of PGCs [48]. This protective effect has been shown to be mediated by methyltransferase DNMT1, because DNMT1 is the only DNA methyltransferase consistently expressed throughout the development of PGCs [49] (Fig. 4) and a DNMT1 knockout (KO) resulted in the up-regulation of post-migratory germ cell-specific genes [50]. Furthermore, a conditional loss of DNMT1 in PGCs leads to precocious differentiation in male germ cells by invoking cell lineage-specific gene expression [51]. Thus, we speculated that DNMT1 may also regulate the expression of specific CT antigens during PGC development. To test this hypothesis, we analyzed RNA-Seq data obtained from both wild type and DNMT1 KO male PGCs at E13.5 [51]. We found 141 differentially expressed genes with a fold change ≥2. Among them, we identified 12 CT antigens that were differentially expressed in PGCs from the DNMT1 KO (Fig. 5a, b). Interestingly, all 12 differentially expressed CT antigens were up-regulated (Fig. 5a), which accounted for 11% of the total number of up-regulated genes (Fig. 5c). These results indicated that DNMT1 can suppress the expression of these CT antigens by maintaining DNA methylation during early PGC development. To confirm this, we looked at the methylation coverage of the promoter regions of these CT antigens in PGCs at E9.5 [48, 51]. Eight out of 12 CT antigens were found to have high methylation occupancy at their promoter regions (>20%) in PGCs at E9.5 (Fig. 5d), suggesting that the expression of these CT antigens may be suppressed by DNMT1 at early stages of PGC development.

Fig. 4
figure 4

DNA methyltransferase DNMT1 is expressed throughout primordial germ cell (PGC) development. Bar graphs show the expression of DNMT1 in both mouse and human PGCs

Fig. 5
figure 5

The expression of a subset of CT antigens is under the control of DNA methyltransferase DNMT1. (a) Bar graph shows that 12 CT antigens were found to be significantly up-regulated upon conditional deletion of DNMT1 in mouse primordial germ cells (PGCs) at E13.5. No CT antigens were significantly down-regulated. (b) Bar graph shows that the 12 up-regulated CT Fig. 5 (continued) antigens showed >2-fold change in their expression levels. (c) Pie chart shows that the 12 up-regulated CT antigens account for 11% of all up-regulated genes in PGCs at E13.5. (d) Bar graph shows that eight of the 12 up-regulated CT antigen contained promoter regions with high DNA methylation (more than 20%) in PGCs at E9.5. (e) Heat map shows the expression of CT antigen TEX101, together with that of DNMT1, in different breast cancer cell lines. MLCa, metastatic lobular carcinoma; MC, metaplastic carcinoma; Inf DuctCa, infiltrating ductal carcinoma; IDC/pap, ductal breast cancer/papillary; IDC, ductal breast cancer; F, fibrocystic disease; Duc.Ca, ductal carcinoma; Ca, carcinoma; AnCa, anapalstic carcinoma; AC, adenocarcinoma; N.A., subtype identity not available. (f) Dot plot shows that the expression of CT antigen TEX101 inversely correlates with DNMT1 in breast cancer. (g) Dot plot shows that the expression of CT antigen TEX101 inversely correlates with DNMT1 in head and neck cancer

We then asked if a similar regulatory mechanism exists between DNMT1 and CT antigens in cancers. To answer this question, we focused on one of the eight CT antigens shown in Fig. 5d, TEX101, because it has been shown to play crucial roles in both spermatogenesis and tumorigenesis. TEX101 KO male mice were infertile due to failed uterotubal junction migration of their sperm [52, 53]. Its aberrant elevated expression has been reported in basal cell carcinoma [54], breast cancer [55, 56] and head and neck squamous cell carcinoma [57]. We first probed the expression levels of both TEX101 and DNMT1 across 54 breast cancer cell lines [58]. Indeed, TEX101 was overexpressed in specific breast cancer cell lines (Fig. 5e). An inverse correlation (Spearman’s correlation, which applied to all correlation analyses in this report) was found between the expression levels of TEX101 and DNMT1 in these breast cancer cell lines (Fig. 5f), consistent with our observations from the PGC analysis. We also acquired gene expression data on TEX101 and DNMT1 from The Cancer Genome Atlas (TCGA) database. Interestingly, an inverse correlation between the expression of TEX101 and DNMT1 was also observed in head and neck cancer samples (Fig. 5g). Taken collectively, this information suggests that DNMT1 may negatively regulate the expression of these CT antigens in different cancer types.

The Expression of CT Antigen TAF7L Is Regulated by DNA Methylation in Cancers

We earlier showed that DNA methylation maintained by DNMT1 may control the expression of specific CT antigens during PGC development (Fig. 5b, d), and this may also hold true in cancer settings (Fig. 5e–g). To further test this idea in cancers, we focused on TAF7L, another CT antigen shown in Fig. 5d. TAF7L is a germ cell-specific transcription factor [59, 60]. Although TAF7L KO mice are fertile, their sperm exhibit abnormal morphology and significantly reduced motility [61]. Little is known about the role of TAF7L in cancers, although one study has reported a down-regulation of TAF7L expression in acute myeloid leukemia [62]. Two alternative promoter regions have been identified for TAF7L. The first promoter (promoter 1) was located at hg19 chrX: 100,548,043-100,548,102, whereas the second one (promoter 2) was located at hg19 chrX: 100,546,312-100,546,371 (Fig. 6a). A 5′ CpG island was present at hg19 chrX: 100,546,064-100,546,550, associating with promoter 2 of TAF7L (Fig. 6a). Furthermore, elevated expression of TAF7L was observed in thyroid cancers compared with normal tissues by analyzing TCGA datasets (Fig. 6b). We then set out to investigate if DNA methylation in the promoter regions of TAF7L or its associated CpG island affects its expression in thyroid cancer. We obtained DNA methylation profiles (Illumina 450k) of TAF7L in both normal and thyroid cancer tissues and coupled them with the TAF7L gene expression data (Fig. 6c). Interestingly, at probe cg13248004 (hg19 chrX: 100,548,042-100,548,044) near promoter 1 of TAF7L (Fig. 6c, green box), DNA methylation significantly decreased in cancer tissues compared with normal tissues (Fig. 6d). DNA demethylation at the same region was also observed in colon and rectal cancers (Fig. 6e, green box and Fig. 6f), suggesting that promoter 1 of TAF7L may undergo DNA demethylation during tumorigenesis. On the other hand, we observed an inverse correlation between the expression level of TAF7L and DNA methylation at probes cg08704539 (hg19 chrX: 100,546,355-100,546,357), cg01538344 (hg19 chrX: 100,546,290-100,546,292) and cg19386336 (hg19 chrX: 100,546,063-100,546,065) within the 5′ CpG island in thyroid cancer tissues, respectively (Fig. 6c, blue box; Fig. 7). These findings indicate that DNA methylation at the promoter or CpG island plays a significant role in the ectopic expression of TAF7L in cancer.

Fig. 6
figure 6

The expression of CT antigen TAF7L is under the control of DNA methylation in its promoter and 5′ CpG island. (a) View of CT antigen TAF7L in UCSC genome browser. TAF7L has two alternative promoters. The 5′ CpG island associated with promoter 2. (b) Box plot shows that the expression of TAF7L is significantly elevated in thyroid cancer compared with normal tissues. (c) Heat map shows the expression of TAF7L in both normal and thyroid cancer together with the DNA methylation status of the 5′ region of TAF7L. (d) Box plot shows that promoter 1 (green box in c) of TAF7L is significantly hypomethylated in thyroid cancer compared with normal tissue. (e) Heat map shows the DNA methylation status of the 5′ region of TAF7L. (f) Box plot shows that promoter 1 (green box in e) of TAF7L is significantly hypomethylated in colon and rectal cancer compared with normal tissues

Fig. 7
figure 7

The expression of CT antigen TAF7L is inversely correlated with DNA methylation of the 5′ CpG island in thyroid cancer. Dot plots show that the expression of TAF7L negatively correlates with the DNA methylation status of its associated 5′ CpG island in thyroid cancer. The exact location of each probe is described in the results session

TAF7L Regulates the Expression of Other CT Antigens

The fact that TAF7L is a transcription factor indicates that it can potentially regulate the expression of many other genes upon its activation during both gametogenesis and tumorigenesis. A recent study has shown that TAF7L targets promoters of many spermatogenesis-specific genes in testes [63]. We analyzed RNA-Seq data obtained from normal and TAF7L KO testes in this previous study and identified 267 differentially expressed CT antigens (fold change ≥2) upon TAF7L deletion (Fig. 8a–g). Among these differentially expressed CT antigens, 84 were up-regulated and 183 were down-regulated (Fig. 8a). We subsequently focused on the CT antigens that were down-regulated after TAF7L KO, because they accounted for the majority of the differentially expressed CT antigens. We filtered the 183 down-regulated CT antigens by setting the threshold of their expression in normal testes as RPKM (reads per kilobase of transcript per million mapped reads) ≥20 in the GTEx (The Genotype-Tissue Expression) Database to rule out CT antigens whose expression levels were already low in normal testes. This resulted in a list of 103 CT antigens. Gene ontology analysis suggested that several CT antigens were enriched in protein processing and metabolic pathways (Fig. 8b), both of which are important in tumorigenesis. We then further filtered the 103 CT antigens by setting the threshold of fold change in their expression at 4 with the idea that the remaining CT antigens should be mostly susceptible to regulation by TAF7L. Sixteen CT antigens passed the threshold of fold change (Fig. 8c). We then confirmed if they were under direct control of TAF7L by analyzing the ChIP-Seq data on TAF7L and RNA polymerase II (POL II) [63]. Peak annotations showed that approximately 94% of TAF7L peaks were within 1 kb of the promoters (Fig. 9a), consistent with a previous analysis [63]. Moreover, the 16 filtered CT antigens were occupied by TAF7L peaks, and these CT antigens were all actively transcribed as indicated by the co-localized POL II peaks (Figs. 8d and 9b, green box). These data suggest that TAF7L regulates the expression of these CT antigens by directly binding to their promoters in testes.

Fig. 8
figure 8

CT antigen TAF7L controls the expression of a subset of CT antigens. (a) Bar graph shows that a total of 84 up-regulated and 183 down-regulated CT antigens are among the genes that were differentially expressed in TAF7L-deficient testes. (b) Down-regulated CT antigens identified in the TAF7L-dificient model were subject to functional annotations. Protein processing in the endoplasmic reticulum and metabolic pathways were enriched in the analysis. The fold change of each CT antigen is indicated. (c) Bar graph shows CT antigens with >4-fold change in expression identified in the TAF7L KO model. (d) ChIP peak coverage of TAF7L and POL II suggests that TAFL7 directly binds to CT antigens, which are actively transcribed as indicated by colocalized peaks of Fig. 8 (continued) TAF7L and POL II (green box). (e) Dot plot shows that the expression of CT antigen FATE1 significantly correlates with the expression of TAF7L in normal human testes. (f) Dot plot shows that the expression of CT antigen FATE1 significantly correlates with the expression of TAF7L in testicular germ cell cancer. (g) Dot plot shows that the expression of CT antigen FATE1 significantly correlates with the expression of TAF7L in pheochromocytoma and paraganglioma

Fig. 9
figure 9

CT antigen TAF7L regulates the expression of a subset of CT antigens. (a) Bar graphs summarize peak annotations of the ChIP-Seq data for both TAF7L and POL II. (b) ChIP peak coverage of TAF7L and POL II indicates that TAFL7 directly binds to CT antigens, which are actively transcribed as indicated by the colocalized peaks of TAF7L and POL II (green box)

We then set out to test if a similar regulatory mechanism between TAF7L and CT antigens is found in cancers. We focused on one of the CT antigens FATE1 (Fig. 8c). FATE1 has been reported to be overexpressed in hepatocellular carcinoma, as well as gastric and colon cancers [64]. Furthermore, FATE1 decreases the sensitivity of adrenocortical carcinoma cells to the chemotherapeutic drug mitotane, and its expression in adrenocortical carcinoma negatively correlates with patient survival [65]. We observed that FATE1 expression was highly correlated with TAF7L expression in normal human testes (Fig. 8e), consistent with our earlier findings (Fig. 8c). Interestingly, this positive correlation persisted in testicular germ cell cancer (Fig. 8f), as well as pheochromocytoma and paraganglioma (Fig. 8g), indicating that FATE1 expression is also under the influence of TAF7L in cancer.

Discussion and Concluding Remarks

The past decade has seen a tremendous surge in the number of high resolution next generation sequencing datasets generated from the fields of reproduction and cancer. New bioinformatics tools have also helped to systematically identify CT antigens, which have greatly expanded the pool of CT antigens from more than 200 to approximately 1000 [13]. At the time of this writing, a new study has reported yet another genome-wide identification of CT antigens, adding an additional 201 CT antigens to an already exhaustive pool of CT antigens [66]. With such an explosion of publicly available datasets, one may be surprised by how little we know about the molecular mechanisms underlying the function and regulation of these CT antigens throughout gametogenesis and tumorigenesis. The problem may no longer be a lack of available data to work on, but a lack of direction on where to look and how to analyze the data. The shared characteristics between gametogenesis and tumorigenesis have attracted considerable interest. It has been proposed that the expression of CT antigens during tumorigenesis is due to the reactivation of a gametogenic program in non-germline somatic cells [6, 12, 67]. If this is true, understanding the regulation of CT antigens during gametogenesis is likely to provide new insights on the regulation of these CT antigens during tumorigenesis.

By re-analyzing published datasets and data from multiple databases, we have re-examined the likely roles of CT antigens in cancers by providing the following information. First, a subset of CT antigens associated with SEs in three different cancer types. Of interest, ten of these CT antigens also associated with SEs in normal mouse testes. By analyzing ChIP-Seq data from human testes and several types of cancer, we expected to find additional conserved SE-associated CT antigens. Thus, it is tempting to speculate that these CT antigens acquired SEs via a conserved mechanism during tumorigenesis, which in turn drives their aberrant expression in different cancer types. SEs have been reported to associate with key oncogenes [27, 68, 69]. We found that the CT antigen TPX2 associated with a SE in MM, GBM, and SCLC, as well as in normal testes. Moreover, TPX2 per se belongs to the human core fitness gene family. Previous studies have shown that certain inhibitors, such as the BET-bromodomain inhibitor JQ1 and CDK7 inhibitor, preferentially target SE-associated oncogenes in cancers [24, 70]. Thus, targeting TPX2 or its associated SE in cancer cells may prove to be an effective therapeutic approach.

Second, we showed that hypomethylation in SE regions may affect the ectopic expression of certain CT antigens in breast cancer. Hypomethylation/demethylation is known to occur throughout gametogenesis and tumorigenesis [47, 71]. Moreover, DNA hypomethylation participates in the activation of a subset of CT antigens in different cancer types [72, 73]. By extracting information from a study in which DNA methyltransferase DNMT1 was conditionally knocked out in PGCs, we found that DNMT1 can potentially suppress the expression of a new subset of CT antigens in cancers, such as TEX101, whose expression strongly correlates with cancer prognosis [66]. The demethylating agent 5-aza-2′-deoxycytidine has been shown to induce the expression of the CT antigen MAGE in both normal and malignant lymphoid cells [74]. In fact, 5-aza-2′-deoxycytidine inhibits DNA methylation by prohibiting DNMT1 access to the progressing replication fork during DNA replication [75]. The ablation of DNMT1 alone, but not DNMT3B alone, in HCT116 cells (a colorectal cancer cell line) activated CT antigens MAGE-A1, NY-ESO-1 and XAGE-1 [76]. Taken collectively, this information confirms our observations on the important role of DNMT1 in regulating CT antigen expression in cancer. Additionally, TAF7L is another CT antigen that was identified in the DNMT1 KO model, wherein the expression of TAF7L regulated by the status of DNA methylation in its promoter and the promoter-associated CpG island in cancers. This finding is not surprising given that demethylation of the CpG sites within the 5′ regions of the CT antigen MAGE-A1 also correlates with its ectopic expression in cancer cells [77]. Thus, DNA hypomethylation may represent the primary mechanism of activation for a growing subset of CT antigens in cancers.

Finally, we examined a testis TAF7L KO model. TAF7L is a germ cell-specific transcription factor. We showed that TAF7L could directly bind to the promoter regions of many CT antigens to regulate their expression in normal testes, and also potentially in cancers. To our surprise, the role of TAF7L resembled that of the previously identified DNA-binding protein BORIS (also known as CTCFL). BORIS per se is a CT antigen whose expression, similar to TAF7L, is also regulated by DNA methylation in its promoter region in different cancer types [78, 79]. Of interest, BORIS can also directly bind to the promoters of other CT antigens, such as NY-ESO-1 in lung cancer cells, to regulate their expression [80]. Thus, there is a subset of CT antigens that function as transcriptional regulators of other CT antigens. Future analysis may identify additional CT antigens with similar roles to provide a more comprehensive outlook on the regulation of CT antigens in cancer.

Summary

In summary, we have shown that it is possible to extract useful information on the regulation of CT antigens during tumorigenesis by studying gametogenesis. This approach may also provide a unique gateway to probing the functions of these genes in future studies. By taking advantage of the ever-growing publicly available datasets, systematic analysis can, without a doubt, provide new insights into these CT antigens.